Skip to content

Commit 4216c03

Browse files
committed
update readme
1 parent 3274253 commit 4216c03

File tree

1 file changed

+37
-20
lines changed

1 file changed

+37
-20
lines changed

README.md

Lines changed: 37 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
<!-- ABOUT THE PROJECT -->
2626
# About
2727

28-
This repository contain the implementations we developed to build a semantic search engine with CLIP. Our code is heavily inspired by [clip-retrieval](https://github.com/rom1504/clip-retrieval/), [autofaiss](https://github.com/criteo/autofaiss), and [CLIP-ONNX](https://github.com/Lednik7/CLIP-ONNX). We keept our implementation simple, focused on working with data from [https://github.com/krea-ai/open-prompts](open-prompts), and prepared to run efficiently on a CPU.
28+
This is the code that we used build our CLIP semantic search engine for [krea.ai](https://krea.ai). This work heavily inspired by [clip-retrieval](https://github.com/rom1504/clip-retrieval/), [autofaiss](https://github.com/criteo/autofaiss), and [CLIP-ONNX](https://github.com/Lednik7/CLIP-ONNX). We keept our implementation simple, focused on working with data from [open-prompts](https://github.com/krea-ai/open-prompts), and prepared to run efficiently on a CPU.
2929

3030
# CLIP Search
3131

@@ -36,16 +36,16 @@ If you are not familiar with CLIP, we would recommend starting with the [blog](h
3636

3737
CLIP is a multi-modal neural network that can encode both, images and text in a common feature space. This means that we can create vectors that contain semantic information extracted from a text or an image. We can use these semantic vectors to compute operations such as cosine similarity, which would give us a similarity score.
3838

39-
As a high level example, when CLIP extracts features from an image with a red car, it produces a similar vector to the one that it creates when extracting the features from the text "a red car", or the image from another red car, since the semantics in all these elements are related.
39+
As a high level example, when CLIP extracts features from an image with a red car, it produces a similar vector to the one that it creates when it sees the text "a red car", or an image from another red carsince the semantics in all these elements are related.
4040

4141
So far, CLIP has been helpful for creating datasets like [LAION-5B](https://laion.ai/blog/laion-5b/), guiding generative models like VQ-GAN, for image classification tasks where there is not a lot of labeled data, or as a backbone for AI models like Stable Diffusion.
4242

4343
## Semantic Search
44-
Given a piece of data such as an image or a text description, semantic search consists of finding similar items within a dataset by comparing feature vectors. These feature vectors are also known as embeddings, and they can be computed in different ways. CLIP is one of the most interesting models for extracting features for semantic search.
44+
Semantic search consists of finding similar items within a dataset by comparing feature vectors. These feature vectors are also known as embeddings, and they can be computed in different ways. CLIP is one of the most interesting models for extracting features for semantic search.
4545

46-
The search process consists of encoding items as embeddings, indexing them, and using these indices for fast search in order to build semantic systems. Romain Beaumont wrote a great [medium post](https://rom1504.medium.com/semantic-search-with-embeddings-index-anything-8fb18556443c) about semantic search, we highly recommend reading it.
46+
The search process consists of encoding items as embeddings, indexing them, and using these indices for fast search. Romain Beaumont wrote a great [medium post](https://rom1504.medium.com/semantic-search-with-embeddings-index-anything-8fb18556443c) about semantic search, we highly recommend reading it.
4747

48-
With the code in this project you can compute embeddings using CLIP, index them using K-Nearest Neighbors, and search for similarities efficiently given an input CLIP embedding. Note that for the input embedding we will be able to use a vector computed from a text or an image, and we can also index CLIP embeddings from both, images and texts.
48+
With this code, you will compute embeddings using CLIP, index them using K-Nearest Neighbors, and search for similarities efficiently given an input CLIP embedding.
4949

5050
# Environment
5151

@@ -67,7 +67,7 @@ Create a new conda environment with the following command:
6767

6868
# Data Preparation
6969

70-
We will use a dataset of prompts generated with stable diffusion from [open-prompts](https://github.com/krea-ai/open-prompts). `1.csv` contains a subset of the dataset with 1000 elements, we recommend using it the first time you run the code to confirm that everything worked well without having to wait for millions items to be downloaded and processed. You can get it from [here](https://github.com/krea-ai/open-prompts/blob/main/data/1k.csv). It has the same structure as `prompts.csv`, which contains the whole dataset. `prompts.csv` can be downloadedf from [here](https://drive.google.com/file/d/1c4WHxtlzvHYd0UY5WCMJNn2EO-Aiv2A0/view).
70+
We will use a dataset of prompts generated with stable diffusion from [open-prompts](https://github.com/krea-ai/open-prompts). `1k.csv` is a subset from a [larger dataset](https://drive.google.com/file/d/1c4WHxtlzvHYd0UY5WCMJNn2EO-Aiv2A0/view) that you can find there—perfect for testing!
7171

7272
# CLIP Search
7373

@@ -85,79 +85,96 @@ First, *sign in* or *sign up* to [Lambda](https://lambdalabs.com/cloud/entrance)
8585

8686
## Search Images
8787
### Download images
88-
The first step will consist of downloading the images from the `csv` file that we downloaded. To do so, we will leverage the [img2dataset](https://github.com/rom1504/img2dataset) package.
88+
The first step will consist of downloading the images from the `csv` file that contains all the prompt data. To do so, we will leverage the [img2dataset](https://github.com/rom1504/img2dataset) package.
8989

90-
First, execute the following command to create a new file with links from images:
90+
Execute the following command to create a new file with links from images:
9191

9292
```
9393
python extract_img_links_from_csv.py
9494
```
9595

96-
Note that by default, the process will create the links from `1k.csv`. Change the `CSV_FILE` variable in `extract_img_links_from_csv.py` if you want to use another dataset.
96+
Note that by default, the process will create the links from `1k.csv`. Change the `CSV_FILE` variable in `extract_img_links_from_csv.py` if you want to use another data file as input.
9797

98-
```
98+
```python
9999
CSV_FILE = "./1k.csv"
100100
OUTPUT_FILE = "./img_links.txt"
101101
```
102102

103-
The result will be stored in `img_links.txt`.
103+
The results will be stored in `img_links.txt`.
104104

105-
Now, run the following command to download images:
105+
Run the following command to download images:
106106

107107
```bash
108108
img2dataset --url_list img_links.txt --output_folder imgs --thread_count=64 --image_size=256
109109
```
110110

111-
The output will be stored in the folder `imgs`, within the folder `00000`.
111+
The output will be stored in a sub-folder named `00000` within `imgs`.
112112

113113
### Compute Visual CLIP Embeddings
114114

115115
Once the folder `imgs` is created and filled with generated images, you can run the following command to compute visual CLIP embeddings for each of them:
116116

117117
`python extract_visual_clip_embeddings.py`
118118

119-
The following are the main parameters that you might need to change:
119+
The following are the main parameters that you might need to change from `extract_visual_clip_embeddings.py`:
120120
```
121121
IMG_DIR = "./imgs/00000" #directory where all your images were downloaded
122122
BATCH_SIZE = 128 #number of CLIP embeddings computed at each iterations
123123
NUM_WORKERS = 14 #number of workers that will run in parallel (recommended is number_of_cores - 2)
124124
```
125125

126-
Once the process is finished, you will see a new folder named `visual_embeddings`. This folder will contain two other folders named `ids` and `embeddings`. `ids` will contain `.npy` files with information of the `ids` of each generation computed at each batch. `embeddings` will contain `.npy` files with the resulting embeddings computed at each batch. This data will be useful for computing the KNN indices, since we need to have information about both, the CLIP embedding and the ID they represent.
126+
Once the process is finished, you will see a new folder named `visual_embeddings`. This folder will contain two other folders named `ids` and `embeddings`. `ids` will contain `.npy` files with information of the `ids` of each generation computed at each batch. `embeddings` will contain `.npy` files with the resulting embeddings computed at each batch. This data will be useful for computing the KNN indices.
127127

128128
### Compute Visual KNN indices
129129
If you did not make any modifications in the default output structure from the previous step, this process should be as easy as running the following command:
130130

131131
`python create_visual_knn_indices.py`
132132

133+
Otherwise, you might want to modify the following variables from `create_visual_knn_indices.py`:
134+
135+
```python
136+
INDICES_FOLDER = "knn_indices"
137+
EMBEDDINGS_DIR = "visual_embeddings"
138+
```
139+
133140
The result will be stored within a new folder named `knn_indices` in a file named `visual_prompts.index`.
134141

142+
135143
### Search Images
136144

137-
In order to make the search of generations more efficient, we will use an ONNX version of CLIP. We will use the implementation from [`CLIP-ONNX`](https://github.com/Lednik7/CLIP-ONNX) for this.
145+
In order to search generated images more efficiently, we will use an ONNX version of CLIP. We will use the implementation from [`CLIP-ONNX`](https://github.com/Lednik7/CLIP-ONNX) for this.
138146

139147
Install the following package:
140148
```bash
141149
pip install git+https://github.com/Lednik7/CLIP-ONNX.git --no-deps
142150
```
143151

144-
Once installed, download the ONNX CLIP models with the following command:
152+
Once installed, download the ONNX CLIP models with the following commands:
145153
```
146154
wget https://clip-as-service.s3.us-east-2.amazonaws.com/models/onnx/ViT-B-32/visual.onnx
147155
wget https://clip-as-service.s3.us-east-2.amazonaws.com/models/onnx/ViT-B-32/textual.onnx
148156
```
149-
Now, execute the following line to perform the search with regular CLIP and ONNX CLIP:
157+
158+
Finally, execute the following line to perform the search with regular CLIP and ONNX CLIP:
150159

151160
```
152161
python test_visual_knn_index.py
153162
```
154163

155-
The result will be a list of image filenames that are the most similar to the prompt `"image of a blue robot with red background"` and the image `prompt-search.png`.
164+
The result should be a list of image filenames that are the most similar to the prompt `"image of a blue robot with red background"` and the image `prompt-search.png`.
156165

157166
Change the following parameters in `test_visual_knn_index.py` to try out different input prompts and images:
158167

159168
```python
160169
INPUT_IMG_PATH = "./prompt-search.png"
161170
INPUT_PROMPT = "image of a blue robot with red background"
162171
NUM_RESULTS = 5
163-
```
172+
```
173+
174+
Have fun!
175+
176+
# Get in touch
177+
178+
- Follow and DM us on Twitter: [@krea_ai](https://twitter.com/krea_ai)
179+
- Join [our Discord community](https://discord.gg/3mkFbvPYut)
180+
- Email either `v` or `d` (`v` at `krea` dot `ai`; `d` at `krea` dot `ai` respectively)

0 commit comments

Comments
 (0)