A joint CSR initiative of Microsoft & SAP
Hitesh Kumar
BE-CSE AI/ML Student, Chandigarh University
Email: [email protected]
Portfolio: Hitesh Kumar
This project explores the power of AI-driven image generation using Stable Diffusion and ComfyUI. The objective is to generate high-quality, photorealistic, and artistic images based on textual prompts using a deep-learning pipeline. The project utilizes ComfyUI, an advanced workflow-based user interface for Stable Diffusion, to generate and refine images. This report discusses the implementation details, workflow structure, dataset preparation, and the automation pipeline used in the project.
- Introduction
- Project Objectives
- Technologies Used
- System Architecture
- Implementation
- Results and Discussion
- Challenges Faced
- Future Work
- Conclusion
- References
Image generation using AI has seen rapid advancements with models like Stable Diffusion, which can generate highly detailed images based on text prompts. This project utilizes ComfyUI, a powerful workflow-based UI, to create a structured pipeline for generating images while optimizing performance.
- Implement an AI-powered image generation system.
- Use Stable Diffusion to generate high-resolution images based on textual prompts.
- Integrate ComfyUI for an efficient and user-friendly interface.
- Automate image generation with Gradio-based UI and API requests.
- Optimize workflow configurations for better output quality.
- Stable Diffusion - AI-based text-to-image model.
- ComfyUI - Workflow-based UI for Stable Diffusion.
- Gradio - Web-based UI framework for AI applications.
- Python - Core programming language for implementation.
- PIL (Pillow) - Image processing library.
- Flask - Backend API development.
- FAISS - Vector search and retrieval system.
- NumPy & OpenCV - Data manipulation and image processing.
The project follows a modular pipeline approach:
-
Input Handling - Users provide a text prompt.
-
Processing - The prompt is sent to Stable Diffusion via ComfyUI.
-
Image Generation - The AI model generates images based on the prompt.
-
Post-processing - Refinements such as upscaling and enhancements.
-
Output Display - The final image is displayed in the Gradio UI.
└── hiteshydv001-image-generation-using-stable-diffusion-comfyui/
├── README.md
├── app.py
├── example_nested_loop.py
├── example_with_random_seed.py
├── requirements.txt
├── text_to_image.json
└── workflow_api.json
The interface is built using Gradio, allowing users to input prompts and generate images in real-time.
- Custom UI with a sleek dark theme.
- Dynamic text-to-image generation.
- Automated prompt processing.
- Status updates and image previews.
import gradio as gr
def generate_image(prompt):
# Call ComfyUI API and generate image (implementation inside app.py)
return f"Generated Image for: {prompt}"
demo = gr.Interface(
fn=generate_image,
inputs=gr.Textbox(placeholder="Enter your prompt here..."),
outputs=gr.Image(type="pil"),
title="AI Image Generator",
)
demo.launch()
The system successfully generates high-quality images with rich details based on textual inputs. The ComfyUI workflow allows greater flexibility and fine-tuning, making it a robust solution for AI-generated art and photography.
- Computational Requirements: Running Stable Diffusion locally requires high GPU power.
- Fine-tuning Prompts: Generating specific images requires carefully crafted prompts.
- Latency Issues: Image generation takes a few seconds to minutes depending on settings.
- Improve the model efficiency and reduce generation time.
- Implement real-time fine-tuning options for users.
- Expand the UI with multiple style customization options.
- Deploy on Hugging Face Spaces for cloud-based access.
This project successfully demonstrates the application of Stable Diffusion & ComfyUI in AI-driven image generation. With an efficient workflow, the system provides an intuitive interface for users to create stunning AI-generated artwork.
- Stable Diffusion Documentation - https://stablediffusionweb.com/
- ComfyUI GitHub Repository - https://github.com/comfyanonymous/ComfyUI
- Gradio Documentation - https://www.gradio.app/
I sincerely thank AICTE, Microsoft, and SAP for the opportunity to work on this AI internship under the Tech Saksham initiative, which enabled me to explore advanced AI-driven image generation techniques.