Skip to content

Commit 9931bb1

Browse files
committed
Updated
1 parent 1f606d0 commit 9931bb1

File tree

4 files changed

+139
-1
lines changed

4 files changed

+139
-1
lines changed

README.md

Lines changed: 59 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,60 @@
11
# ComfyUI-AutoLabel
2-
Generate detailed descriptions of the main object in an image.
2+
3+
ComfyUI-AutoLabel is a custom node for [ComfyUI](https://github.com/comfyanonymous/ComfyUI) that uses BLIP (Bootstrapping Language-Image Pre-training) to generate detailed descriptions of the main object in an image. This node leverages the power of BLIP to provide accurate and context-aware captions for images.
4+
5+
![ComfyUI-AutoLabel](demo.png)
6+
7+
## Features
8+
9+
- **Image to Text Description**: Generate detailed descriptions of the main object in an image.
10+
- **Customizable Prompts**: Provide your own prompt to guide the description generation.
11+
- **Flexible Inference Modes**: Supports GPU, GPU with float16, and CPU inference modes.
12+
- **Offline Mode**: Option to download and use models offline.
13+
14+
## Installation
15+
16+
1. **Clone the Repository**: Clone this repository into your `custom_nodes` folder in ComfyUI.
17+
18+
```bash
19+
git clone https://github.com/fexploit/ComfyUI-AutoLabel custom_nodes/ComfyUI-AutoLabel
20+
```
21+
22+
2. **Install Dependencies**: Navigate to the cloned folder and install the required dependencies.
23+
24+
```bash
25+
cd custom_nodes/ComfyUI-AutoLabel
26+
pip install -r requirements.txt
27+
```
28+
29+
## Usage
30+
31+
### Adding the Node
32+
33+
1. Start ComfyUI.
34+
2. Add the `AutoLabel` node from the custom nodes list.
35+
3. Connect an image input and configure the parameters as needed.
36+
37+
### Parameters
38+
39+
- `image` (required): The input image tensor.
40+
- `prompt` (optional): A string to guide the description generation (default: "a photography of").
41+
- `repo_id` (optional): The Hugging Face model repository ID (default: "Salesforce/blip-image-captioning-base").
42+
- `inference_mode` (optional): The inference mode, can be "gpu_float16", "gpu", or "cpu" (default: "gpu").
43+
- `get_model_online` (optional): Boolean flag to download the model online if not already present (default: True).
44+
45+
## Contributing
46+
47+
Contributions are welcome! Please open an issue or submit a pull request with your changes.
48+
49+
## License
50+
51+
This project is licensed under the MIT License.
52+
53+
## Acknowledgements
54+
55+
- [ComfyUI](https://github.com/comfyanonymous/ComfyUI)
56+
- [BLIP](https://huggingface.co/Salesforce/blip-image-captioning-base)
57+
58+
## Contact
59+
60+
For any inquiries, please open an issue on the [GitHub repository](https://github.com/fexploit/ComfyUI-AutoLabel).

__init__.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
from .autolabel import AutoLabel
2+
3+
NODE_CLASS_MAPPINGS = {
4+
"AutoLabel": AutoLabel
5+
}
6+
7+
NODE_DISPLAY_NAME_MAPPINGS = {
8+
"AutoLabel": "Auto Label"
9+
}

autolabel.py

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
import os
2+
import torch
3+
from PIL import Image
4+
from transformers import BlipProcessor, BlipForConditionalGeneration
5+
6+
class AutoLabel:
7+
def __init__(self):
8+
pass
9+
10+
@classmethod
11+
def INPUT_TYPES(cls):
12+
return {
13+
"required": {
14+
"image": ("IMAGE",),
15+
"prompt": ("STRING", {"default": "a photography of"}),
16+
"repo_id": ("STRING", {"default": "Salesforce/blip-image-captioning-base"}),
17+
"inference_mode": (["gpu_float16", "gpu", "cpu"],),
18+
"get_model_online": ("BOOLEAN", {"default": True},)
19+
}
20+
}
21+
22+
RETURN_TYPES = ("STRING",)
23+
RETURN_NAMES = ("main_object_description",)
24+
FUNCTION = "generate_caption"
25+
CATEGORY = "AutoLabel"
26+
27+
def tensor_to_image(self, tensor):
28+
tensor = tensor.cpu()
29+
image_np = tensor.squeeze().mul(255).clamp(0, 255).byte().numpy()
30+
image = Image.fromarray(image_np, mode='RGB')
31+
return image
32+
33+
def generate_caption(self, image, prompt, repo_id, inference_mode, get_model_online):
34+
if image is None:
35+
raise ValueError("Need an image")
36+
if not repo_id:
37+
raise ValueError("Need a repo_id or local_model_path")
38+
39+
if not get_model_online:
40+
os.environ['TRANSFORMERS_OFFLINE'] = "1"
41+
42+
processor = BlipProcessor.from_pretrained(repo_id)
43+
44+
pil_image = self.tensor_to_image(image)
45+
46+
try:
47+
if inference_mode == "gpu_float16":
48+
model = BlipForConditionalGeneration.from_pretrained(repo_id, torch_dtype=torch.float16).to("cuda")
49+
inputs = processor(pil_image, prompt, return_tensors="pt").to("cuda", torch.float16)
50+
elif inference_mode == "gpu":
51+
model = BlipForConditionalGeneration.from_pretrained(repo_id).to("cuda")
52+
inputs = processor(pil_image, prompt, return_tensors="pt").to("cuda")
53+
else:
54+
model = BlipForConditionalGeneration.from_pretrained(repo_id)
55+
inputs = processor(pil_image, prompt, return_tensors="pt")
56+
57+
out = model.generate(**inputs)
58+
description = processor.decode(out[0], skip_special_tokens=True)
59+
return (description,)
60+
61+
except Exception as e:
62+
print(e)
63+
return ("Error occurred during caption generation",)
64+
65+
NODE_CLASS_MAPPINGS = {
66+
"AutoLabel": AutoLabel
67+
}
68+
69+
NODE_DISPLAY_NAME_MAPPINGS = {
70+
"AutoLabel": "Auto Label"
71+
}

demo.png

282 KB
Loading

0 commit comments

Comments
 (0)