-
Notifications
You must be signed in to change notification settings - Fork 6.6k
Qwen Image Layered Support #12853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qwen Image Layered Support #12853
Conversation
sayakpaul
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this PR! My comments should mostly be minor.
LMK if anything is unclear.
Let's also update with a test and the docs?
src/diffusers/pipelines/qwenimage/pipeline_qwenimage_layered.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/qwenimage/pipeline_qwenimage_layered.py
Outdated
Show resolved
Hide resolved
| image = torch.cat(image, dim=2) # b c f h w | ||
| image = image.permute(0, 2, 3, 4, 1) # b f h w c | ||
| image = (image * 0.5 + 0.5).clamp(0, 1).cpu().float().numpy() | ||
| image = (image * 255).round().astype("uint8") | ||
| images = [] | ||
| for layers in image: | ||
| images.append([Image.fromarray(layer) for layer in layers]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if we can leverage self.image_processor.postprocess here? This is what we usually follow for the pipelines? If we need to have a different postprocess method, we could implement a separate ImageProcessor within the qwenimage module and use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need a separate image processor
I think we can probably use existing one by looping over the layers and process one layer each time
if not, ok to keep the code here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks the advice, I've got a better implementation in the new commit.
src/diffusers/pipelines/qwenimage/pipeline_qwenimage_layered.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/qwenimage/pipeline_qwenimage_layered.py
Outdated
Show resolved
Hide resolved
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
yiyixuxu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
looks really good to me & we can merge this soon!
src/diffusers/pipelines/qwenimage/pipeline_qwenimage_layered.py
Outdated
Show resolved
Hide resolved
| image = torch.cat(image, dim=2) # b c f h w | ||
| image = image.permute(0, 2, 3, 4, 1) # b f h w c | ||
| image = (image * 0.5 + 0.5).clamp(0, 1).cpu().float().numpy() | ||
| image = (image * 255).round().astype("uint8") | ||
| images = [] | ||
| for layers in image: | ||
| images.append([Image.fromarray(layer) for layer in layers]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need a separate image processor
I think we can probably use existing one by looping over the layers and process one layer each time
if not, ok to keep the code here
sayakpaul
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM on my end. Maybe we could just update https://huggingface.co/docs/diffusers/main/en/api/pipelines/qwenimage before we merge the PR?
I will try to add tests in a follow-up and ask for your reviews.
| if use_additional_t_cond: | ||
| self.addition_t_embedding = nn.Embedding(2, embedding_dim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better.
| image = self.vae.decode(latents, return_dict=False)[0] # (b f) c 1 h w | ||
|
|
||
| image = image.squeeze(2) | ||
|
|
||
| image = self.image_processor.postprocess(image, output_type=output_type) | ||
| images = [] | ||
| for bidx in range(b): | ||
| images.append(image[bidx * f : (bidx + 1) * f]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is clean! Thanks for this :)
|
@bot /style |
|
Style bot fixed some files and pushed the changes. |
make fix-copies
|
Failing tests are unrelated. Looking forward to seeing Qwen ImageLayer a grand success! Also, hopefully, this kind of starts a new paradigm in image generation! |
|
@naykun Although image is an optional input in pipeline parameters. But without entering an image, it cannot run successfully. Looking forward to repairing. Best regards! |
🚀 This PR introduces Qwen Image Layered—a groundbreaking vision model that dissects images into rich, structured layers (think foreground, background, objects, and more)!
By unlocking pixel-perfect, semantically aware decomposition, we’re not just enabling smarter image editing—we’re igniting a whole new playground for creators, developers, and AI artists. Imagine remixing scenes like multitrack audio, editing objects in isolation, or generating dynamic compositions with unprecedented control.
The future of image generation is layered, modular, and collaborative—and it starts right here. Let’s build it together! 🎨✨
cc @sayakpaul @yiyixuxu