Skip to content

Blueprint for federated finetuning, enabling multiple data owners to collaboratively fine-tune models without sharing raw data. Developed in collaboration with Flower.

License

Notifications You must be signed in to change notification settings

mozilla-ai/federated-finetuning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

47 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Project logo

Federated Fine-Tuning Blueprint with Flower

Tests Ruff

Large language models (LLMs), which have been trained on vast amounts of publicly accessible data, are great.

However, the availability of high-quality public data is decreasing. Federated AI enables multiple data owners to collaboratively fine-tune models without sharing raw data, unlocking access to distributed private datasets.

This blueprint demonstrates Federated Fine-Tuning of LLMs using Flower, a framework for federated learning. We fine-tune Qwen2-0.5B-Instruct model on the Alpaca-GPT4 dataset using PEFT-based LoRA adapters.

πŸ“˜ To explore this project further and discover other Blueprints, visit the Blueprints Hub.

Flower


πŸ“‹ Pre-requisites

  • System requirements:
    • OS: Linux
    • Python 3.10 or higherp
    • Minimum RAM: 8GB (recommended for LLM fine-tuning)
    • All dependencies are listed in pyproject.toml.

Built with

Python Hugging Face Streamlit Flower

πŸš€ Quick Start

1️⃣ Clone the Project

git clone https://github.com/mozilla-ai/federated-finetuning.git
cd federated-finetuning

2️⃣ Update submodule and install dependencies

pip install -e .  # Install root project dependencies

3️⃣ Run Federated Fine-Tuning with Flower

You can run your Flower project in both simulation and deployment mode without making changes to the code. If you are starting with Flower, we recommend you using the simulation mode as it requires fewer components to be launched manually. By default, flwr run will make use of the Simulation Engine. We have provided you with a run.sh script for this Blueprint. The default model that runs is FL simulations with a 4-bit Qwen2-0.5B-Instruct model involving 2 clients per rounds for 100 FL rounds. You can override configuration parameters by either change them in pyproject.toml or directly when calling the flwr run ..

Run with the Simulation Engine (Recommended)

Note

Check the Simulation Engine documentation to learn more about Flower simulations and how to optimize them.

It is highly recommended that you run this example on GPU as it will take much much longer on CPU. The default commands run the federated fine-tuning on GPU.

flwr run .

# Run for 10 rounds but increasing the fraction of clients that participate per round to 25%
flwr run . --run-config "num-server-rounds=10 strategy.fraction-fit=0.25"

Nevertheless, if you want to run federated fine-tuning on CPU, you can run the following which disables quantization and sets number of GPUs to 0:

# Set number of CPUs to max available for faster processing.
flwr run . --run-config "model.quantization=0" --federation-config "options.num-supernodes=20 options.backend.client-resources.num-gpus=0.0 options.backend.client-resources.num-cpus=4"

Run with the Deployment Engine

Follow this how-to guide to run the same app in this example but with Flower's Deployment Engine. After that, you might be interested in setting up secure TLS-enabled communications and SuperNode authentication in your federation.

If you are already familiar with how the Deployment Engine works, you may want to learn how to run it using Docker. Check out the Flower with Docker documentation.

Test Your Fine-Tuned Model

To generate text responses from your trained model:

python demo/generate_response.py --peft-path=/path/to/trained-model \
    --question="What is the ideal 1-day plan in London?"

🎭 Interactive Demo: Streamlit App

We provide a Streamlit web demo for testing the fine-tuned model in real time. This assumes that you have ssh:ed into remote machine.

Run the Demo

streamlit run demo/app.py --server.address=0.0.0.0 --server.port=8501 -- --model-path <MODEL_PATH> # <RESULTS><TIMESTAMP><PEFT_#>

Once running, open your browser and go to:

https://localhost:8501

Here, you can input a question and receive model-generated responses.

Local Fine-Tuning

Local fine-tuning involves training a model on a single dataset partition instead of multiple federated clients. This provides a baseline for evaluating the effectiveness of federated fine-tuning. The process follows these steps:

  1. Dataset Selection: The dataset is loaded and split into smaller partitions to mimic client data distributions. A single partition (1/n of the full dataset) is used for training.
  2. Model Configuration: A 4-bit quantized model is loaded and prepared using PEFT-based LoRA adapters.
  3. Training: The model is fine-tuned on the assigned dataset partition using SFTTrainer.
  4. Evaluation: The resulting model is tested and compared against federated fine-tuned models to assess performance differences.

Local fine-tuning is executed using:

python src/fine-tune-local.py

After training, the fine-tuned model is stored in results/ for later evaluation.

We also provide ./demo/run.sh that runs federated fine-tuning with default settings, and thereafter local fine-tuning. It will then test both fine-tuned models on benchmark data.

Expected results

Device Resources

You can adjust the CPU/GPU resources you assign to each of the clients based on your device. For example, it is easy to train 2 concurrent clients on each GPU (8 GB VRAM) if you choose 0.5-billion model. Assigning 50% of the GPU's VRAM to each client by setting options.backend.clientapp-gpus = 0.5 under [tool.flwr.federations.local-simulation] in pyproject.toml.

πŸ› οΈ How It Works

This project integrates:

  1. Dataset Handling - Uses Flower Datasets to download, partition, and preprocess the dataset.
  2. Model Fine-Tuning - Uses the PEFT library to fine-tune Qwen2-0.5B-Instruct model. It also supports LoRA adapters to make fine-tuning efficient and lightweight.
  3. Federated Training via Flower - Uses Flower’s Simulation Engine to orchestrate training across distributed clients, and allows fine-tuning on a single GPU by running simulations.

Project Structure

federated-finetuning-blueprint
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ flowertune-llm/
β”‚   β”‚   β”œβ”€β”€ client_app.py    # Defines the Flower client
β”‚   β”‚   β”œβ”€β”€ server_app.py    # Defines the Flower server
β”‚   β”‚   β”œβ”€β”€ dataset.py       # Handles data loading and preprocessing
β”‚   β”‚   β”œβ”€β”€ models.py        # Defines the fine-tuning models
β”‚   β”œβ”€β”€ benchmarks/general-nlp/
β”‚   β”‚   β”œβ”€β”€ benchmarks.py    # Defines benchmarks
β”‚   β”‚   β”œβ”€β”€ eval.py          # Evaluates the fine-tuned model
β”‚   β”‚   β”œβ”€β”€ utils.py         # Useful functions
β”‚   β”œβ”€β”€ fine-tune-local.py   # Fine-tunes a local model
β”‚   β”œβ”€β”€ plot_results.py      # Visualization script
β”‚
β”œβ”€β”€ demo/
β”‚   β”œβ”€β”€ generate_response.py  # Script to generate model responses
β”‚   β”œβ”€β”€ app.py                # Run streamlit demo
β”‚
β”œβ”€β”€ pyproject.toml  # Project metadata and dependencies

❓ Troubleshooting

Installation Issues:

Ensure dependencies are installed:

pip install -e .

CUDA Issues:

If running on GPU, verify CUDA installation:

python -c "import torch; print(torch.cuda.is_available())"

Slow Training:

  • Lower num-server-rounds in pyproject.toml.

πŸ“œ License

This project is licensed under the Apache 2.0 License. See the LICENSE file for details.

🀝 Contributing

Contributions are welcome! To get started, check out the CONTRIBUTING.md file.

About

Blueprint for federated finetuning, enabling multiple data owners to collaboratively fine-tune models without sharing raw data. Developed in collaboration with Flower.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •