Skip to content

Conversation

radames
Copy link
Collaborator

@radames radames commented Dec 26, 2023

Demo Notes

  • Frontend in Svelte
  • Backend: FastAPI / WebSocket / MJPEG stream
  • Wrapper is copied and modified here to accept a prompt for img2img and engine_dir, allowing me to specify the directory and reuse the compiled model in the Docker environment.

All the StreamDiffusion code is on img2img.py. Please feel free to add any speedup suggestions.
I'm using t_index_list=[35, 45]. Is there a way to provide a strength on a 0-1 scale?

@cumulo-autumn
Copy link
Owner

cumulo-autumn commented Dec 26, 2023

We will make a noise scheduler function and realize parametric operations such as noise strength 0.0-1.0 and num_denoising_steps 1-50. We will also do the review for the PR soon!

Copy link
Contributor

@GradientSurfer GradientSurfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks cool, nice work @radames! I've been tinkering with a strikingly similar set of changes, but using a canvas with drawing tools instead of webcam input.

If you and the team don't mind unsolicited feedback, I'll leave a review and share a few suggestions/thoughts that I hope you find helpful:

  1. Batch inference
    It appears image frames are processed one at a time in this demo, but batching multiple frames together for higher throughput (& FPS) should result in a smoother experience (at the expense of increased latency).

  2. Circular buffer & continuous streaming
    It looks like the server requests the client to send a frame - instead of this request/response cycle, the client could continuously stream image frames to the server which would maintain a circular buffer that can then be used to perform batch inference. Notably examples/screen/main.py uses that approach.

  3. No separate endpoint/stream for returning generated images
    The generated image could be returned to the client via the websocket connection, instead of via a separate API endpoint. This could be a minor code simplification, and notably would sidestep the linked chromium bug (so we could avoid sending frames twice to every browser that isn't firefox).

  4. Return raw pixels in RGBA format
    Generated images can be returned to the client in raw RGBA pixel format and then directly written to the canvas. This may be a relatively minor optimization, but it avoids the overhead of transforming to & from JPEG format and any associated lossy compression.

  5. Integrate wrapper.py modifications
    The modifications to accept a prompt for img2img and to support engine_dir are great and seem well contained, those ought to be integrated into the canonical wrapper.py so there is no unnecessary duplication of code or maintenance burden.

Perhaps these ideas could be addressed here or in future PRs (or not at all), either way I'd be happy to discuss or collaborate further on details - feel free to reach out.

@radames
Copy link
Collaborator Author

radames commented Dec 27, 2023

hi @GradientSurfer, thanks for detailed response! I really appreciate the feedback, I'm happy to address some of your points on this PR, also if you're interested in collaborating, please send edits, commits. do you have PR edit access?

Looks cool, nice work @radames! I've been tinkering with a strikingly similar set of changes, but using a canvas with drawing tools instead of webcam input.

If you and the team don't mind unsolicited feedback, I'll leave a review and share a few suggestions/thoughts that I hope you find helpful:

  1. Batch inference
    It appears image frames are processed one at a time in this demo, but batching multiple frames together for higher throughput (& FPS) should result in a smoother experience (at the expense of increased latency).

Addressing number 2 here, we can try a batching approach!

  1. Circular buffer & continuous streaming
    It looks like the server requests the client to send a frame - instead of this request/response cycle, the client could continuously stream image frames to the server which would maintain a circular buffer that can then be used to perform batch inference. Notably examples/screen/main.py uses that approach.

Ohh yes that makes a lot of sense, In my original demo with LCM, I did use an async queue(), but back in time, the inference was slow, and the result was a lagged video, thus I decided to switch to ping/pong approach.

  1. No separate endpoint/stream for returning generated images
    The generated image could be returned to the client via the websocket connection, instead of via a separate API endpoint. This could be a minor code simplification, and notably would sidestep the linked chromium bug (so we could avoid sending frames twice to every browser that isn't firefox).
    Yes, that's a great point, MJPEG stream seems a bit awkward, and buggy on Chrome, Ideally it would be on WebRTC, but I was looking into performance and simplicity. An open socket jpeg streaming looked faster to me compared to sending blobs over websocket, it needs an extra decoding processing to send the bytes to the <img>. However, this demo seems very fast and it's doing the blob over websockets -> <img> https://www.fal.ai/camera
  1. Return raw pixels in RGBA format
    Generated images can be returned to the client in raw RGBA pixel format and then directly written to the canvas. This may be a relatively minor optimization, but it avoids the overhead of transforming to & from JPEG format and any associated lossy compression.

Yes you're right, however I did the canvas to normalize the webcam image, cropping on the desired size, this could be done on the backend, whichever is faster.

  1. Integrate wrapper.py modifications
    The modifications to accept a prompt for img2img and to support engine_dir are great and seem well contained, those ought to be integrated into the canonical wrapper.py so there is no unnecessary duplication of code or maintenance burden.
    Done on PR wrapper.py: pass prompt to img2img and optional engine_dir arg #66 , and when it's merged I can update it here.

Perhaps these ideas could be addressed here or in future PRs (or not at all), either way I'd be happy to discuss or collaborate further on details - feel free to reach out.

@GradientSurfer
Copy link
Contributor

@radames I do not have PR edit access here, @cumulo-autumn perhaps you would consider granting collaborator access?

@radames
Copy link
Collaborator Author

radames commented Dec 30, 2023

hi @cumulo-autumn , I think it's good now. I've fixed a couple of uncaught exceptions. One important note, while the server and the client were designed to accept multiple queued connections, the wrapper and StreamDiffusionWrapper are not working well in that regard, i.e. the buffer across stream.stream is shared, so if you open multiple browser tabs and switch the prompt and webcams, you'll notice the images are leaking across tabs. For instance when using diffusers pipe(...) it's possible to have queue calls, as long as they're quick inference, example here.
ps. please pull and test again if you can on Windows

@cumulo-autumn
Copy link
Owner

@radames I do not have PR edit access here, @cumulo-autumn perhaps you would consider granting collaborator access?

Hi @GradientSurfer . Thank you for your valuable PR submissions in the past, and for your many meaningful suggestions this time as well! Regarding PR edit access, currently, we are keeping it within a group of acquaintances, so please allow us to hold off on adding new PR edit access for now. However, we are very much open to more discussions and PRs in the future, so we definitely want to continue doing those! (I apologize for the late response this time, as it has been a busy end-of-year period. Also, I really appreciate your prompt and valuable feedback on this PR.) We will consider our policy on adding new PR edit access in the future!

@cumulo-autumn
Copy link
Owner

hi @cumulo-autumn , I think it's good now. I've fixed a couple of uncaught exceptions. One important note, while the server and the client were designed to accept multiple queued connections, the wrapper and StreamDiffusionWrapper are not working well in that regard, i.e. the buffer across stream.stream is shared, so if you open multiple browser tabs and switch the prompt and webcams, you'll notice the images are leaking across tabs. For instance when using diffusers pipe(...) it's possible to have queue calls, as long as they're quick inference, example here. ps. please pull and test again if you can on Windows

Hi @radames. Thank you for the update! It works perfectly in my environment too! I am going to merge it.

@cumulo-autumn cumulo-autumn merged commit a3d01c4 into cumulo-autumn:main Dec 30, 2023
@radames radames deleted the dev/demo-img2img branch December 31, 2023 07:28
@openSourcerer9000
Copy link

@cumulo-autumn I'm not seeing denoising_strength param in the latest repo. How do we set it? Thanks,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants