Fine-tune FLAN-T5 XL/XXL using DeepSpeed on Amazon SageMaker

FLAN-T5, released with the Scaling Instruction-Finetuned Language Models paper, is an enhanced version of T5 that has been fine-tuned in a mixture of tasks, or simple words, a better T5 model in any aspect. FLAN-T5 outperforms T5 by double-digit improvements for the same number of parameters. Google has open sourced 5 checkpoints available on Hugging Face ranging from 80M parameter up to 11B parameter.

In a previous blog post, we already learned how to “Fine-tune FLAN-T5 XL/XXL using DeepSpeed & Hugging Face Transformers”. In this blog post, we look into how we can integrate DeepSpeed into Amazon SageMaker to allow any practitioners to train those billion parameter size models with a simple API call. Amazon SageMaker managed training allows you to train large language models without having to manage the underlying infrastructure. You can find more information about Amazon SageMaker in the documentation.

This means we will learn how to fine-tune FLAN-T5 XL & XXL using model parallelism, multiple GPUs, and DeepSpeed ZeRO on Amazon SageMaker.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
configs		configs
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ds_launcher.py		ds_launcher.py
requirements.txt		requirements.txt
sagemaker-notebook.ipynb		sagemaker-notebook.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fine-tune FLAN-T5 XL/XXL using DeepSpeed on Amazon SageMaker

Notebook

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

philschmid/deepspeed-sagemaker-example

Folders and files

Latest commit

History

Repository files navigation

Fine-tune FLAN-T5 XL/XXL using DeepSpeed on Amazon SageMaker

Notebook

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages