Skip to content

yandex-research/swd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 

Repository files navigation

Scale-wise Distillation of Diffusion Models

           

⚡️ SwD is twice as fast as leading distillation methods
⚡️ SwD surpasses leading distillation methods within the same computational budget

💡 Quick introduction

The paper introduces Scale-wise Distillation (SwD), a novel framework for accelerating diffusion models (DMs) by progressively increasing spatial resolution during the generation process. SwD achieves significant speedups (2.5× to 10×) compared to full-resolution models while maintaining or even improving image quality.

The human evaluation shows that SwD is highly competitive and often outperforms the baselines. SwD generates images with higher complexity compared to leading approaches.

🔥 News

  • Jun, 2025: 🔥 We have released the FLUX checkpoint. Check the demo.
  • Apr, 2025: 🤗 SwD + LCM flow matching scheduler has been integrated into the diffusers library: link

🔧 TODO

  • Training code
  • FLUX
  • ComfyUI support
  • Inference with SD3.5

🔥 Inference

HF 🤗 Models

We release three versions of SwD: Medium (2B) and Large (8B), distilled from SD3.5, and FLUX (12B)
SwD requires two key hyperparameters: scales and sigmas .

  • The scales hyperparameter defines the spatial resolution at which predictions are performed during the generation process. It specifies the sequence of resolutions (e.g., starting from a lower resolution like 256×256 and progressively increasing to the target resolution, such as 1024×1024).
  • The sigmas hyperparameter controls the noise levels applied at each step of the diffusion process. It is equivalent to diffusion timesteps.
Model Scales Sigmas
SwD 2B, 6 steps 32, 48, 64, 80, 96, 128 1.0000, 0.9454, 0.8959, 0.7904, 0.7371, 0.6022, 0.0000
SwD 2B, 4 steps 32, 64, 96, 128 1.0000, 0.9454, 0.7904, 0.6022, 0.0000
SwD 8B, 6 steps 32, 48, 64, 80, 96, 128 1.0000, 0.9454, 0.8959, 0.7904, 0.7371, 0.6022, 0.0000
SwD 8B, 4 steps 64, 80, 96, 128 1.0000, 0.8959, 0.7371, 0.6022, 0.0000
SwD FLUX, 4 steps 64, 80, 96, 128 1.0000, 0.8956, 0.7363, 0.6007, 0.0000

Upgrade to the latest version of the 🧨 diffusers and 🧨 peft

pip install -U diffusers
pip install -U peft

and then you can run

📌 FLUX

import torch
from diffusers import FluxPipeline
from peft import PeftModel

pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", 
                                    torch_dtype=torch.float16, 
                                    custom_pipeline='quickjkee/swd_pipeline_flux').to('cuda')
distill_check = 'yresearch/swd_flux'
pipe.transformer = PeftModel.from_pretrained(
    pipe.transformer,
    distill_check,
)

sigmas = [1.0000, 0.8956, 0.7363, 0.6007, 0.0000]
scales = [64, 80, 96, 128]
prompt = 'a cat reading a newspaper'

image = pipe(
    prompt=prompt,
    height=int(scales[0] * 8),
    width=int(scales[0] * 8),
    scales=scales,
    sigmas=sigmas,
    timesteps=torch.tensor(sigmas[:-1]).to('cuda') * 1000,
    guidance_scale=4.5,
    max_sequence_length=512,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]

📌 Stable Diffusion 3.5 Large

Probably, you will need to specify the visible device: %env CUDA_VISIBLE_DEVICES=0, for correct loading of LoRAs.

import torch
from diffusers.schedulers.scheduling_flow_match_lcm import FlowMatchLCMScheduler
from diffusers import StableDiffusion3Pipeline
from peft import PeftModel

# Loading models
pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3.5-large", 
                                                torch_dtype=torch.float16)
pipe = pipe.to("cuda")
lora_path = 'yresearch/swd-large-6-steps'
pipe.transformer = PeftModel.from_pretrained(
    pipe.transformer,
    lora_path,
)

# LCM scheduler
pipe.scheduler = FlowMatchLCMScheduler.from_config(pipe.scheduler.config, 
                                                   shift=1.0)

# Setting up the scale factors
# We define it wrt input size 
# In this case: 32->48->64->80->96->128
pipe.scheduler.set_scale_factors(
    scale_factors=[1.5, 2., 2.5, 3., 4.], 
    upscale_mode='bicubic'
)

# Generation
image = pipe(
    "a cat reading a newspaper",
    sigmas=[1.0000, 0.9454, 0.8959, 0.7904, 0.7371, 0.6022],
    guidance_scale=0.0,
    generator=torch.Generator().manual_seed(0),
    width=256,
    height=256
).images[0]

(Deprecated) Alternatively, you can use a custom pipeline. In this case, we have slightly different working semantics with scaling factors.

import torch
from diffusers import StableDiffusion3Pipeline
from peft import PeftModel

pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3.5-large",
                                                torch_dtype=torch.float16,
                                                custom_pipeline='quickjkee/swd_pipeline')
pipe = pipe.to("cuda")
lora_path = 'yresearch/swd-large-6-steps'
pipe.transformer = PeftModel.from_pretrained(
    pipe.transformer,
    lora_path,
).to("cuda")

generator = torch.Generator().manual_seed(0)
prompt = 'a cat reading a newspaper'
sigmas = [1.0000, 0.9454, 0.8959, 0.7904, 0.7371, 0.6022, 0.0000]
scales = [32, 48, 64, 80, 96, 128]

images = pipe(
    prompt,
    sigmas=torch.tensor(sigmas).to('cuda'),
    timesteps=torch.tensor(sigmas[:-1]).to('cuda') * 1000,
    scales=scales,
    guidance_scale=0.0,
    height=int(scales[0] * 8),
    width=int(scales[0] * 8),
    generator=generator,
).images

🔧 Training

Coming soon!

Citation

@article{starodubcev2025swd,
  title={Scale-wise Distillation of Diffusion Models},
  author={Nikita Starodubcev and Denis Kuznedelev and Artem Babenko and Dmitry Baranchuk},
  journal={arXiv preprint arXiv:2503.16397},
  year={2025}
}