-
Notifications
You must be signed in to change notification settings - Fork 6.2k
[examples] add train flux-controlnet scripts in example. #9324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
8ab9b5b
4a53573
14e9970
3bb431c
973c6fb
599c984
24b58f8
22a3e10
32eb1ef
57d143b
af1b7a5
d19b101
64251ac
c98d43f
569e0de
916fd80
67deb7a
76bcf5a
32fbeac
b03cb01
7b98459
443f251
bc68f1a
fe2a587
3dc16ca
aff0951
b858507
7bdf9e3
ba45495
c862d39
4b979e0
0655a75
90badc2
4755557
e3d10bc
f9400a6
192bbee
de06965
8ee2daf
17fc1ee
213faf9
b533cae
be965f0
a2daa9f
4d7c1af
a11219c
49a1492
6169b61
d895b8f
395d2f7
b6a9021
66dfdbe
b097d0d
49787e3
eb64557
e9d3e04
7245c75
25fc313
c2b44d3
bc2ea9e
2ee67c4
7ab1b80
56cd984
7cedfb1
ee6ca90
dcac1b0
89a1f35
6ccd3e4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,177 @@ | ||
# ControlNet training example for FLUX | ||
|
||
The `train_controlnet_flux.py` script shows how to implement the ControlNet training procedure and adapt it for [FLUX](https://github.com/black-forest-labs/flux). | ||
|
||
Training script provided by LibAI, which is an institution dedicated to the progress and achievement of artificial general intelligence.LibAI is the developer of [cutout.pro](https://www.cutout.pro/) and [promeai.pro](https://www.promeai.pro/). | ||
|
||
PromeAIpro marked this conversation as resolved.
Show resolved
Hide resolved
|
||
## Running locally with PyTorch | ||
|
||
### Installing the dependencies | ||
|
||
Before running the scripts, make sure to install the library's training dependencies: | ||
|
||
**Important** | ||
|
||
To make sure you can successfully run the latest versions of the example scripts, we highly recommend **installing from source** and keeping the install up to date as we update the example scripts frequently and install some example-specific requirements. To do this, execute the following steps in a new virtual environment: | ||
|
||
```bash | ||
git clone https://github.com/huggingface/diffusers | ||
cd diffusers | ||
pip install -e . | ||
``` | ||
|
||
Then cd in the `examples/controlnet` folder and run | ||
```bash | ||
pip install -r requirements_flux.txt | ||
``` | ||
|
||
And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with: | ||
|
||
```bash | ||
accelerate config | ||
``` | ||
|
||
Or for a default accelerate configuration without answering questions about your environment | ||
|
||
```bash | ||
accelerate config default | ||
``` | ||
|
||
Or if your environment doesn't support an interactive shell (e.g., a notebook) | ||
|
||
```python | ||
from accelerate.utils import write_basic_config | ||
write_basic_config() | ||
``` | ||
|
||
When running `accelerate config`, if we specify torch compile mode to True there can be dramatic speedups. | ||
|
||
## Custom Datasets | ||
|
||
We support importing data from jsonl(xxx.jsonl),here is a brief example: | ||
```sh | ||
{"image_path": "xxx", "caption": "xxx", "control_path": "xxx"} | ||
{"image_path": "xxx", "caption": "xxx", "control_path": "xxx"} | ||
``` | ||
|
||
|
||
## Training | ||
|
||
Our training examples use two test conditioning images. They can be downloaded by running | ||
|
||
```sh | ||
wget https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_1.png | ||
wget https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_2.png | ||
``` | ||
|
||
Then run `huggingface-cli login` to log into your Hugging Face account. This is needed to be able to push the trained ControlNet parameters to Hugging Face Hub. | ||
|
||
we can define the num_layers, num_single_layers, which determines the size of the control(default values are num_layers=4, num_single_layers=10) | ||
|
||
|
||
```bash | ||
export MODEL_DIR="black-forest-labs/FLUX.1-dev" | ||
export OUTPUT_DIR="path to save model" | ||
export TRAIN_JSON_FILE="path to your jsonl file" | ||
PromeAIpro marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
accelerate launch train_controlnet_flux.py \ | ||
--pretrained_model_name_or_path=$MODEL_DIR \ | ||
--conditioning_image_column=control_path \ | ||
--image_column=image_path \ | ||
--caption_column=caption \ | ||
--output_dir=$OUTPUT_DIR \ | ||
--jsonl_for_train=$TRAIN_JSON_FILE \ | ||
--mixed_precision="bf16" \ | ||
--resolution=512 \ | ||
--learning_rate=1e-5 \ | ||
--max_train_steps=15000 \ | ||
--validation_steps=100 \ | ||
--checkpointing_steps=200 \ | ||
--validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \ | ||
--validation_prompt "red circle with blue background" "cyan circle with brown floral background" \ | ||
--train_batch_size=1 \ | ||
--gradient_accumulation_steps=4 \ | ||
--report_to="tensorboard" \ | ||
PromeAIpro marked this conversation as resolved.
Show resolved
Hide resolved
|
||
--num_double_layers=4 \ | ||
--num_single_layers=0 \ | ||
--seed=42 \ | ||
sayakpaul marked this conversation as resolved.
Show resolved
Hide resolved
|
||
``` | ||
|
||
To better track our training experiments, we're using the following flags in the command above: | ||
|
||
* `report_to="tensorboard` will ensure the training runs are tracked on Weights and Biases. | ||
PromeAIpro marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* `validation_image`, `validation_prompt`, and `validation_steps` to allow the script to do a few validation inference runs. This allows us to qualitatively check if the training is progressing as expected. | ||
|
||
Our experiments were conducted on a single 40GB A100 GPU. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wow, 40GB A100 seems doable. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm sorry, this is the 80g A100 (I wrote it wrong), I did a lot of extra work to get it to train with the zero3 on the 40g A100, but I don't think this is suitable for everyone There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not at all. I think it would still be nice to include the changes you had to make in the form of notes in the README. Does that work? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll see if I can add it later. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @sayakpaul We added a tutorial on configuring deepspeed in the readme. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are some tricks to lower GPU:
With 1, 2, 3, can this thing be controlled to be trained under 40GB? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. According to my practice, deepspeedzero3 must be used, @linjiapro your settings will cost about 70g when 1024 with bs 1 or 512 with bs 3. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sorry to bother you, have you ever tried cache text-encoder and vae latents to run with lower GPU? @PromeAIpro @linjiapro There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. cache text-encoder is already available in this script (saving about 10g of gpu memory on T5), about cache vae You can check how to use deepspeed in the readme, which includes cache vae. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fyi you can also reduce memory usage by using |
||
|
||
### Inference | ||
|
||
Once training is done, we can perform inference like so: | ||
|
||
```python | ||
import torch | ||
from diffusers.utils import load_image | ||
from diffusers.pipelines.flux.pipeline_flux_controlnet import FluxControlNetPipeline | ||
from diffusers.models.controlnet_flux import FluxControlNetModel | ||
|
||
base_model = 'black-forest-labs/FLUX.1-dev' | ||
controlnet_model = 'path to controlnet' | ||
PromeAIpro marked this conversation as resolved.
Show resolved
Hide resolved
|
||
controlnet = FluxControlNetModel.from_pretrained(controlnet_model, torch_dtype=torch.bfloat16) | ||
pipe = FluxControlNetPipeline.from_pretrained(base_model, | ||
controlnet=controlnet, | ||
torch_dtype=torch.bfloat16) | ||
pipe.to("cuda") | ||
PromeAIpro marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
control_image = load_image("./conditioning_image_1.png").resize((1024, 1024)) | ||
PromeAIpro marked this conversation as resolved.
Show resolved
Hide resolved
|
||
prompt = "pale golden rod circle with old lace background" | ||
|
||
image = pipe( | ||
prompt, | ||
control_image=control_image, | ||
controlnet_conditioning_scale=0.6, | ||
num_inference_steps=28, | ||
guidance_scale=3.5, | ||
).images[0] | ||
image.save("./output.png") | ||
``` | ||
|
||
## Notes | ||
|
||
### T5 dont support bf16 autocast and i dont know why, will cause black image. | ||
|
||
```diff | ||
if is_final_validation or torch.backends.mps.is_available(): | ||
autocast_ctx = nullcontext() | ||
else: | ||
# t5 seems not support autocast and i don't know why | ||
+ autocast_ctx = nullcontext() | ||
- autocast_ctx = torch.autocast(accelerator.device.type) | ||
``` | ||
sayakpaul marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### TO Fix Error | ||
sayakpaul marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```bash | ||
RuntimeError: mat1 and mat2 must have the same dtype, but got Float and BFloat16 | ||
``` | ||
|
||
#### we need to change some code in `diffusers/src/diffusers/pipelines/flux/pipeline_flux_controlnet.py` to ensure the dtype | ||
|
||
```diff | ||
noise_pred = self.transformer( | ||
hidden_states=latents, | ||
# YiYi notes: divide it by 1000 for now because we scale it by 1000 in the transforme rmodel (we should not keep it but I want to keep the inputs same for the model for testing) | ||
timestep=timestep / 1000, | ||
guidance=guidance, | ||
pooled_projections=pooled_prompt_embeds, | ||
encoder_hidden_states=prompt_embeds, | ||
- controlnet_block_samples=controlnet_block_samples, | ||
- controlnet_single_block_samples=controlnet_single_block_samples, | ||
+ controlnet_block_samples=[sample.to(dtype=latents.dtype) for sample in controlnet_block_samples]if controlnet_block_samples is not None else None, | ||
+ controlnet_single_block_samples=[sample.to(dtype=latents.dtype) for sample in controlnet_single_block_samples] if controlnet_single_block_samples is not None else None, | ||
txt_ids=text_ids, | ||
img_ids=latent_image_ids, | ||
joint_attention_kwargs=self.joint_attention_kwargs, | ||
return_dict=False, | ||
)[0] | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
accelerate>=0.16.0 | ||
torchvision | ||
transformers>=4.25.1 | ||
ftfy | ||
tensorboard | ||
Jinja2 | ||
datasets | ||
wandb | ||
SentencePiece |
Uh oh!
There was an error while loading. Please reload this page.