Releases: minimaxir/gpt-2-simple
v0.8.1: TensorFlow 2 support
Thanks to https://github.com/YaleDHLab via #275, gpt-2-simple now supports TensorFlow 2 by default, and the minimum TensorFlow version is now 2.5.1! The Colab Notebook has also been update to no longer use TensorFlow 1.X.
Note: Development on gpt-2-simple has mostly been superceded by aitextgen, which has similar AI text generation capabilities with more efficient training time and resource usage. If you do not require using TensorFlow, I recommend using aitextgen instead. Checkpoints trained using gpt-2-simple can be loaded using aitextgen as well.
Fix model URL
Remove finetuning asserts
Some have successfully finetuned 774M/1558M, so the assert has been removed.
Multi-GPU support + TF 2.0 assert
Handle 774M (large)
- 774M is explicitly blocked from being fine-tuned and will trigger an assert if attempted. If a way to finetune it without being super-painful is added, the ability to finetune it will be restored.
- Allow ability to generate text from the default pretrained models by passing
model_nametogpt2.load_gpt2()andgpt2.generate()(this will work with 774M. - Add
sgdas anoptimizerparameter tofinetune(default:adam) - Support for changed model names, w/ changes more prominent in the README.
Polish before TF 2.0
Remove assertion
Assertion was triggering false positives, so removing it.
Prevent OOB + Cap Gen Length
Minor fix to prevent issue hit with gpt-2-cloud-run.
A goal of the release was to allow a graph reset without resetting the parameters; that did not seem to work, so holding off on that release.
Fixed prefix + miscellaneous bug fixes
Merged PRs (including fix for prefix issue). (see commits for more info)
A bunch of highly-requested features
Adapted a few functions from Neil Shepperd's fork:
- Nucleus Sampling (
top_p) when generating text, which results in surprisingly different results. (settingtop_p=0.9works well). Supercedestop_kwhen used. (#51) - An
encode_dataset()function to preencode and compress a large dataset before loading it for finetuning. (#19, #54)
Improvements to continuing model training:
overwriteargument forfinetune: withrestore_from="latest", this continues model training without creating a duplicate copy of the model, and is therefore good for transfer learning using multiple datasets (#20)- You can continue to
finetunea model without having the original GPT-2 model present.
Improvements with I/O involving Colaboratory
- Checkpoint folders are now packaged into a
.tarfile when copying to Google Drive, and when copying from Google Drive, the '.tar' file is automatically unpackaged into the correct checkpoint format. (you can passcopy_folder=Trueto thecopy_checkpointfunction to revert to the old behavior). (#37: thanks @woctezuma !) copy_checkpoint_to_gdriveandcopy_checkpoint_from_gdrivenow take arun_nameargument instead of acheckpoint_folderargument.
Miscellaneous
- Added CLI arguments for
top_k,top_p,overwrite. - Cleaned up redundant function parameters (#39)