Skip to content

Conversation

@naykun
Copy link
Contributor

@naykun naykun commented Dec 17, 2025

🚀 This PR introduces Qwen Image Layered—a groundbreaking vision model that dissects images into rich, structured layers (think foreground, background, objects, and more)!

By unlocking pixel-perfect, semantically aware decomposition, we’re not just enabling smarter image editing—we’re igniting a whole new playground for creators, developers, and AI artists. Imagine remixing scenes like multitrack audio, editing objects in isolation, or generating dynamic compositions with unprecedented control.

The future of image generation is layered, modular, and collaborative—and it starts right here. Let’s build it together! 🎨✨

cc @sayakpaul @yiyixuxu

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this PR! My comments should mostly be minor.

LMK if anything is unclear.

Let's also update with a test and the docs?

Comment on lines 926 to 932
image = torch.cat(image, dim=2) # b c f h w
image = image.permute(0, 2, 3, 4, 1) # b f h w c
image = (image * 0.5 + 0.5).clamp(0, 1).cpu().float().numpy()
image = (image * 255).round().astype("uint8")
images = []
for layers in image:
images.append([Image.fromarray(layer) for layer in layers])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we can leverage self.image_processor.postprocess here? This is what we usually follow for the pipelines? If we need to have a different postprocess method, we could implement a separate ImageProcessor within the qwenimage module and use it.

Example:
https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/hunyuan_video1_5/image_processor.py

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need a separate image processor
I think we can probably use existing one by looping over the layers and process one layer each time
if not, ok to keep the code here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks the advice, I've got a better implementation in the new commit.

@sayakpaul sayakpaul requested review from DN6 and yiyixuxu December 17, 2025 07:45
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!
looks really good to me & we can merge this soon!

Comment on lines 926 to 932
image = torch.cat(image, dim=2) # b c f h w
image = image.permute(0, 2, 3, 4, 1) # b f h w c
image = (image * 0.5 + 0.5).clamp(0, 1).cpu().float().numpy()
image = (image * 255).round().astype("uint8")
images = []
for layers in image:
images.append([Image.fromarray(layer) for layer in layers])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need a separate image processor
I think we can probably use existing one by looping over the layers and process one layer each time
if not, ok to keep the code here

@naykun naykun requested review from sayakpaul and yiyixuxu December 17, 2025 10:15
Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on my end. Maybe we could just update https://huggingface.co/docs/diffusers/main/en/api/pipelines/qwenimage before we merge the PR?

I will try to add tests in a follow-up and ask for your reviews.

Comment on lines +152 to +153
if use_additional_t_cond:
self.addition_t_embedding = nn.Embedding(2, embedding_dim)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better.

Comment on lines +870 to +877
image = self.vae.decode(latents, return_dict=False)[0] # (b f) c 1 h w

image = image.squeeze(2)

image = self.image_processor.postprocess(image, output_type=output_type)
images = []
for bidx in range(b):
images.append(image[bidx * f : (bidx + 1) * f])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clean! Thanks for this :)

@sayakpaul
Copy link
Member

@bot /style

@github-actions
Copy link
Contributor

github-actions bot commented Dec 17, 2025

Style bot fixed some files and pushed the changes.

@sayakpaul
Copy link
Member

@naykun I opened naykun#1 to help fix the consistency problems. Could you please check and merge?

@sayakpaul
Copy link
Member

sayakpaul commented Dec 17, 2025

Failing tests are unrelated. Looking forward to seeing Qwen ImageLayer a grand success! Also, hopefully, this kind of starts a new paradigm in image generation!

@sayakpaul sayakpaul merged commit f9c1e61 into huggingface:main Dec 17, 2025
10 of 11 checks passed
@gluttony-10
Copy link

@naykun Although image is an optional input in pipeline parameters. But without entering an image, it cannot run successfully. Looking forward to repairing.

Best regards!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants