Skip to content

Conversation

pftq
Copy link

@pftq pftq commented Apr 23, 2025

Either (1) merge chaojie's PR first and then this one or (2) merge this one only, which includes chaojie already as a PR-merge to this PR. What won't work is merging this one first and then chaojie's after.

I updated the generate_video files to support the following:

  • Added seed synchronization code to allow random seed with multi-GPU (Multi-GPU Results in Static Noise After Initial Frames #24).
  • Reduced 20-min+ load time on multi-GPU to ~8min by fixing contention (all GPUs loading models at once). Indirectly also solved CPU RAM spike during multi-GPU (>200GB on 4 GPUs) (Multi-GPU Initialization Takes 20 Min on 8 GPUs (Fix Provided) #28).
  • Fixed CuSolver error that occasionally comes up in multi-GPU by presetting linear algebra library (Occasional CuSolver Error on Multi-GPU - Bug and Fix #37).
  • Removed duplicate model loading line on I2V pipeline
  • Added batch_size parameter to allow multiple videos to generate without reloading the model, which takes about 20 min on multi-gpu so this saves a lot of time.
  • Added preserve_image_aspect_ratio parameter to allow preserving original image aspect ratio.
  • Fixed DF script not resize-cropping the image (I2V script does it but DF is missing the code).
  • Exposed negative_prompt to allow that to be changed/overwritten.
  • Friendlier filenames with date, seed, cfg, steps, and other details in front.

This also includes and cleanly integrates chaojie's fork (#12):

  • Prompt travel, allow multiple text strings in the --prompt parameter to guide the video differently each chunk of base_num_frames.
  • Video input via --video parameter, allow continuing/extending from a video.
  • Partially complete videos will be output as each chunk of base_num_frames completes. In combination with the --video paramater, this lets you effectively resume from a previous render as well as abort mid-render if the videos take a turn you don't like. Extremely useful for saving time and "watching" as the renders complete rather than committing the full time.

Let me know if there is anything you guys want changed for the PR. I still think you guys have the best open-source model so far, just that it's really hard for an average user to get good results without a lot of debugging, so I'm happy to help out.

Multi-GPU with video input and prompt travel, batch of 10, preserving aspect ratio.
Change --video "video.mp4" to --image "image.jpg" if you want to load a starting image instead.

model_id=Skywork/SkyReels-V2-DF-14B-540P
gpu_count=2
torchrun --nproc_per_node=${gpu_count} generate_video_df.py \
  --model_id ${model_id} \
  --resolution 540P \
  --ar_step 0 \
  --base_num_frames 97 \
  --num_frames 257 \
  --overlap_history 17 \
  --inference_steps 50 \
  --guidance_scale 6 \
  --batch_size 10 \
  --preserve_image_aspect_ratio \
  --video "video.mp4" \
  --prompt "The first thing he does" \
  "The second thing he does." \
  "The third thing he does." \
  --negative_prompt "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards" \
  --addnoise_condition 20 \
  --use_usp \
  --offload

Single GPU with video input and prompt travel, batch of 10, preserving aspect ratio.
Change --video "video.mp4" to --image "image.jpg" if you want to load a starting image instead.

model_id=Skywork/SkyReels-V2-DF-14B-540P
python3 generate_video_df.py \
  --model_id ${model_id} \
  --resolution 540P \
  --ar_step 0 \
  --base_num_frames 97 \
  --num_frames 257 \
  --overlap_history 17 \
  --inference_steps 50 \
  --guidance_scale 6 \
  --batch_size 10 \
  --preserve_image_aspect_ratio \
  --video "video.mp4" \
  --prompt "The first thing he does" \
  "The second thing he does." \
  "The third thing he does." \
  --negative_prompt "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards" \
  --addnoise_condition 20 \
  --offload

chaojie and others added 29 commits April 22, 2025 03:44
python3 generate_video_df.py   --model_id ${model_id}   --resolution 540P   --ar_step 0   --base_num_frames 97   --num_frames 177   --overlap_history 17    --addnoise_condition 20   --offload --prompt 'A woman in a leather jacket and sunglasses riding a vintage motorcycle through a desert highway at sunset, her hair blowing wildly in the wind as the motorcycle kicks up dust, with the golden sun casting long shadows across the barren landscape.' 'A woman flies into space'
Added batch mode, added option to keep original aspect ratio, synchronized seeds on multi-gpu.
…nized randomized seeds on multi-gpu, exposed negative_prompt option.
@pftq pftq changed the title Batch Mode + Maintain Aspect Ratio + Multi-GPU Random Seed + Fixed Multi-GPU CuSolver Error + Fixed 20-min Load Time Batch Mode + Maintain Aspect Ratio + Multi-GPU Random Seed + Fixed Multi-GPU CuSolver Error + Fixed 20-min Load Time + Video Input May 3, 2025
@pftq pftq mentioned this pull request May 6, 2025
@innokria
Copy link

innokria commented May 6, 2025

I have the latest code however i still cant generate video wth multiple prompt in sequence
https://huggingface.co/spaces/rahul7star/Skyreel-V2-Enchance

any extra params that i need to pass ?

@pftq
Copy link
Author

pftq commented May 6, 2025

What is your commandline prompt and what is the error when you try to run?

@innokria
Copy link

innokria commented May 6, 2025

What is your commandline prompt and what is the error when you try to run?

no error just that video is not rendering second part

hmm I am runnning via gradio
https://huggingface.co/spaces/rahul7star/Skyreel-V2-Enchance/blob/main/app.py

'A woman in a leather jacket and sunglasses riding a vintage motorcycle through a desert highway at sunset, her hair blowing wildly in the wind as the motorcycle kicks up dust, with the golden sun casting long shadows across the barren landscape.' 'A woman flies into space'

let me run that generate_video_df.py direclty and see rather than my custum gradio to wrap around

@pftq
Copy link
Author

pftq commented May 6, 2025

I'm not familiar with gradio - have you tried running it via command line like the example on the readme? A lot of the work for this fork's changes is in the generate py file, so you'd have to replicate those into your custom code.

What is your num_frames and base_num_frames? It'd be good to know your full parameters list. Each additional prompt is assigned to a chunk, with # of chunks = num_frames/base_num_frames. So you need to make sure you have enough chunks to make it to the next prompt.

If you're running the fork directly via command line you can also see the debug saying what prompt/chunk it is currently on to see.

@innokria
Copy link

innokria commented May 6, 2025

I'm not familiar with gradio - have you tried running it via command line like the example on the readme? A lot of the work for this fork's changes is in the generate py file, so you'd have to replicate those into your custom code.

What is your num_frames and base_num_frames? It'd be good to know your full parameters list. Each additional prompt is assigned to a chunk, with # of chunks = num_frames/base_num_frames. So you need to make sure you have enough chunks to make it to the next prompt.

If you're running the fork directly via command line you can also see the debug saying what prompt/chunk it is currently on to see.

alright going to use exact as ur setting , my video length was 5 sec so only less frame i guess i will set the video to 10 sec .. thanks for your help will keep messing :)

@qiwang1996
Copy link

qiwang1996 commented May 6, 2025

image
Thanks for your good work! I tested your code on 4* [4090 48G] and i set offload cpu option. However, there is still a problem of CPU RAM spike during multi-GPU (>200GB on 4 GPUs). I wonder if i do something wrong.

model_id="./SkyReels-V2-DF-14B-540P"
gpu_count=4
torchrun --nproc_per_node=${gpu_count} generate_video_df.py \
  --model_id ${model_id} \
  --resolution 540P \
  --ar_step 0 \
  --base_num_frames  97 \  
  --num_frames 289 \  
  --overlap_history 17 \
  --inference_steps 50 \
  --guidance_scale 6 \
  --batch_size 1 \
  --preserve_image_aspect_ratio \
  --prompt  "A graceful white swan with a curved neck and delicate feathers swimming in a serene lake at dawn, its reflection perfectly mirrored in the still water as mist rises from the surface, with the swan occasionally dipping its head into the water to feed." \
  --negative_prompt "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards" \
  --addnoise_condition 20 \
  --use_ret_steps \
  --teacache_thresh 0.0 \
  --use_usp \
  --offload

My script is above.

@pftq
Copy link
Author

pftq commented May 6, 2025

Thanks - it's hard to be sure on that issue - I didn't directly set out to solve it. I just know my Runpod instance was crashing with 4xA40s previously and afterwards it did not. I wonder if it is partly because you still have RAM to spare so maybe the system is not as stringent on clearing the memory.

@qiwang1996
Copy link

@pftq thanks for your fast reply. I guess the existing difference of RAM manager strategy between runpod and autodl where i run my code caused it.

@innokria
Copy link

innokria commented May 7, 2025

image Thanks for your good work! I tested your code on 4* [4090 48G] and i set offload cpu option. However, there is still a problem of CPU RAM spike during multi-GPU (>200GB on 4 GPUs). I wonder if i do something wrong.

model_id="./SkyReels-V2-DF-14B-540P"
gpu_count=4
torchrun --nproc_per_node=${gpu_count} generate_video_df.py \
  --model_id ${model_id} \
  --resolution 540P \
  --ar_step 0 \
  --base_num_frames  97 \  
  --num_frames 289 \  
  --overlap_history 17 \
  --inference_steps 50 \
  --guidance_scale 6 \
  --batch_size 1 \
  --preserve_image_aspect_ratio \
  --prompt  "A graceful white swan with a curved neck and delicate feathers swimming in a serene lake at dawn, its reflection perfectly mirrored in the still water as mist rises from the surface, with the swan occasionally dipping its head into the water to feed." \
  --negative_prompt "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards" \
  --addnoise_condition 20 \
  --use_ret_steps \
  --teacache_thresh 0.0 \
  --use_usp \
  --offload

My script is above.

Even I got "Script timed out after 15 minutes. Try reducing frame count or prompt complexity."

@pftq
Copy link
Author

pftq commented May 8, 2025

That's not an error message from anywhere in this code repo - if you are embedding this in a custom script or environment, you would need to look there for the issue. Additionally that is the multi-gpu code, which is quite complex so I don't recommend embedding that in another wrapper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants