Large language models (LLMs), which have been trained on vast amounts of publicly accessible data, are great.
However, the availability of high-quality public data is decreasing. Federated AI enables multiple data owners to collaboratively fine-tune models without sharing raw data, unlocking access to distributed private datasets.
This blueprint demonstrates Federated Fine-Tuning of LLMs using Flower, a framework for federated learning. We fine-tune Qwen2-0.5B-Instruct model on the Alpaca-GPT4 dataset using PEFT-based LoRA adapters.
π To explore this project further and discover other Blueprints, visit the Blueprints Hub.
- System requirements:
- OS: Linux
- Python 3.10 or higherp
- Minimum RAM: 8GB (recommended for LLM fine-tuning)
- All dependencies are listed in
pyproject.toml
.
git clone https://github.com/mozilla-ai/federated-finetuning.git
cd federated-finetuning
pip install -e . # Install root project dependencies
You can run your Flower project in both simulation and deployment mode without making changes to the code. If you are starting with Flower, we recommend you using the simulation mode as it requires fewer components to be launched manually. By default, flwr run
will make use of the Simulation Engine. We have provided you with a run.sh
script for this Blueprint. The default model that runs is FL simulations with a 4-bit Qwen2-0.5B-Instruct model involving 2 clients per rounds for 100 FL rounds. You can override configuration parameters by either change them in pyproject.toml
or directly when calling the flwr run .
.
Note
Check the Simulation Engine documentation to learn more about Flower simulations and how to optimize them.
It is highly recommended that you run this example on GPU as it will take much much longer on CPU. The default commands run the federated fine-tuning on GPU.
flwr run .
# Run for 10 rounds but increasing the fraction of clients that participate per round to 25%
flwr run . --run-config "num-server-rounds=10 strategy.fraction-fit=0.25"
Nevertheless, if you want to run federated fine-tuning on CPU, you can run the following which disables quantization and sets number of GPUs to 0:
# Set number of CPUs to max available for faster processing.
flwr run . --run-config "model.quantization=0" --federation-config "options.num-supernodes=20 options.backend.client-resources.num-gpus=0.0 options.backend.client-resources.num-cpus=4"
Follow this how-to guide to run the same app in this example but with Flower's Deployment Engine. After that, you might be interested in setting up secure TLS-enabled communications and SuperNode authentication in your federation.
If you are already familiar with how the Deployment Engine works, you may want to learn how to run it using Docker. Check out the Flower with Docker documentation.
To generate text responses from your trained model:
python demo/generate_response.py --peft-path=/path/to/trained-model \
--question="What is the ideal 1-day plan in London?"
We provide a Streamlit web demo for testing the fine-tuned model in real time. This assumes that you have ssh:ed into remote machine.
streamlit run demo/app.py --server.address=0.0.0.0 --server.port=8501 -- --model-path <MODEL_PATH> # <RESULTS><TIMESTAMP><PEFT_#>
Once running, open your browser and go to:
https://localhost:8501
Here, you can input a question and receive model-generated responses.
Local fine-tuning involves training a model on a single dataset partition instead of multiple federated clients. This provides a baseline for evaluating the effectiveness of federated fine-tuning. The process follows these steps:
- Dataset Selection: The dataset is loaded and split into smaller partitions to mimic client data distributions. A single partition (1/n of the full dataset) is used for training.
- Model Configuration: A 4-bit quantized model is loaded and prepared using PEFT-based LoRA adapters.
- Training: The model is fine-tuned on the assigned dataset partition using
SFTTrainer
. - Evaluation: The resulting model is tested and compared against federated fine-tuned models to assess performance differences.
Local fine-tuning is executed using:
python src/fine-tune-local.py
After training, the fine-tuned model is stored in results/
for later evaluation.
We also provide ./demo/run.sh
that runs federated fine-tuning with default settings, and thereafter local fine-tuning. It will then test both fine-tuned models on benchmark data.
You can adjust the CPU/GPU resources you assign to each of the clients based on your device.
For example, it is easy to train 2 concurrent clients on each GPU (8 GB VRAM) if you choose 0.5-billion model.
Assigning 50% of the GPU's VRAM to each client by setting options.backend.clientapp-gpus = 0.5
under [tool.flwr.federations.local-simulation]
in pyproject.toml
.
This project integrates:
- Dataset Handling - Uses Flower Datasets to download, partition, and preprocess the dataset.
- Model Fine-Tuning - Uses the PEFT library to fine-tune Qwen2-0.5B-Instruct model. It also supports LoRA adapters to make fine-tuning efficient and lightweight.
- Federated Training via Flower - Uses Flowerβs Simulation Engine to orchestrate training across distributed clients, and allows fine-tuning on a single GPU by running simulations.
federated-finetuning-blueprint
βββ src/
β βββ flowertune-llm/
β β βββ client_app.py # Defines the Flower client
β β βββ server_app.py # Defines the Flower server
β β βββ dataset.py # Handles data loading and preprocessing
β β βββ models.py # Defines the fine-tuning models
β βββ benchmarks/general-nlp/
β β βββ benchmarks.py # Defines benchmarks
β β βββ eval.py # Evaluates the fine-tuned model
β β βββ utils.py # Useful functions
β βββ fine-tune-local.py # Fine-tunes a local model
β βββ plot_results.py # Visualization script
β
βββ demo/
β βββ generate_response.py # Script to generate model responses
β βββ app.py # Run streamlit demo
β
βββ pyproject.toml # Project metadata and dependencies
Ensure dependencies are installed:
pip install -e .
If running on GPU, verify CUDA installation:
python -c "import torch; print(torch.cuda.is_available())"
- Lower num-server-rounds in pyproject.toml.
This project is licensed under the Apache 2.0 License. See the LICENSE file for details.
π€ Contributing
Contributions are welcome! To get started, check out the CONTRIBUTING.md file.