You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+37-20Lines changed: 37 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,7 +25,7 @@
25
25
<!-- ABOUT THE PROJECT -->
26
26
# About
27
27
28
-
This repository contain the implementations we developed to build a semantic search engine with CLIP. Our code is heavily inspired by [clip-retrieval](https://github.com/rom1504/clip-retrieval/), [autofaiss](https://github.com/criteo/autofaiss), and [CLIP-ONNX](https://github.com/Lednik7/CLIP-ONNX). We keept our implementation simple, focused on working with data from [https://github.com/krea-ai/open-prompts](open-prompts), and prepared to run efficiently on a CPU.
28
+
This is the code that we used build our CLIP semantic search engine for [krea.ai](https://krea.ai). This work heavily inspired by [clip-retrieval](https://github.com/rom1504/clip-retrieval/), [autofaiss](https://github.com/criteo/autofaiss), and [CLIP-ONNX](https://github.com/Lednik7/CLIP-ONNX). We keept our implementation simple, focused on working with data from [open-prompts](https://github.com/krea-ai/open-prompts), and prepared to run efficiently on a CPU.
29
29
30
30
# CLIP Search
31
31
@@ -36,16 +36,16 @@ If you are not familiar with CLIP, we would recommend starting with the [blog](h
36
36
37
37
CLIP is a multi-modal neural network that can encode both, images and text in a common feature space. This means that we can create vectors that contain semantic information extracted from a text or an image. We can use these semantic vectors to compute operations such as cosine similarity, which would give us a similarity score.
38
38
39
-
As a high level example, when CLIP extracts features from an image with a red car, it produces a similar vector to the one that it creates when extracting the features from the text "a red car", or the image from another red car, since the semantics in all these elements are related.
39
+
As a high level example, when CLIP extracts features from an image with a red car, it produces a similar vector to the one that it creates when it sees the text "a red car", or an image from another red car—since the semantics in all these elements are related.
40
40
41
41
So far, CLIP has been helpful for creating datasets like [LAION-5B](https://laion.ai/blog/laion-5b/), guiding generative models like VQ-GAN, for image classification tasks where there is not a lot of labeled data, or as a backbone for AI models like Stable Diffusion.
42
42
43
43
## Semantic Search
44
-
Given a piece of data such as an image or a text description, semantic search consists of finding similar items within a dataset by comparing feature vectors. These feature vectors are also known as embeddings, and they can be computed in different ways. CLIP is one of the most interesting models for extracting features for semantic search.
44
+
Semantic search consists of finding similar items within a dataset by comparing feature vectors. These feature vectors are also known as embeddings, and they can be computed in different ways. CLIP is one of the most interesting models for extracting features for semantic search.
45
45
46
-
The search process consists of encoding items as embeddings, indexing them, and using these indices for fast search in order to build semantic systems. Romain Beaumont wrote a great [medium post](https://rom1504.medium.com/semantic-search-with-embeddings-index-anything-8fb18556443c) about semantic search, we highly recommend reading it.
46
+
The search process consists of encoding items as embeddings, indexing them, and using these indices for fast search. Romain Beaumont wrote a great [medium post](https://rom1504.medium.com/semantic-search-with-embeddings-index-anything-8fb18556443c) about semantic search, we highly recommend reading it.
47
47
48
-
With the code in this project you can compute embeddings using CLIP, index them using K-Nearest Neighbors, and search for similarities efficiently given an input CLIP embedding. Note that for the input embedding we will be able to use a vector computed from a text or an image, and we can also index CLIP embeddings from both, images and texts.
48
+
With this code, you will compute embeddings using CLIP, index them using K-Nearest Neighbors, and search for similarities efficiently given an input CLIP embedding.
49
49
50
50
# Environment
51
51
@@ -67,7 +67,7 @@ Create a new conda environment with the following command:
67
67
68
68
# Data Preparation
69
69
70
-
We will use a dataset of prompts generated with stable diffusion from [open-prompts](https://github.com/krea-ai/open-prompts). `1.csv`contains a subset of the dataset with 1000 elements, we recommend using it the first time you run the code to confirm that everything worked well without having to wait for millions items to be downloaded and processed. You can get it from [here](https://github.com/krea-ai/open-prompts/blob/main/data/1k.csv). It has the same structure as `prompts.csv`, which contains the whole dataset. `prompts.csv` can be downloadedf from [here](https://drive.google.com/file/d/1c4WHxtlzvHYd0UY5WCMJNn2EO-Aiv2A0/view).
70
+
We will use a dataset of prompts generated with stable diffusion from [open-prompts](https://github.com/krea-ai/open-prompts). `1k.csv`is a subset from a [larger dataset](https://drive.google.com/file/d/1c4WHxtlzvHYd0UY5WCMJNn2EO-Aiv2A0/view) that you can find there—perfect for testing!
71
71
72
72
# CLIP Search
73
73
@@ -85,79 +85,96 @@ First, *sign in* or *sign up* to [Lambda](https://lambdalabs.com/cloud/entrance)
85
85
86
86
## Search Images
87
87
### Download images
88
-
The first step will consist of downloading the images from the `csv` file that we downloaded. To do so, we will leverage the [img2dataset](https://github.com/rom1504/img2dataset) package.
88
+
The first step will consist of downloading the images from the `csv` file that contains all the prompt data. To do so, we will leverage the [img2dataset](https://github.com/rom1504/img2dataset) package.
89
89
90
-
First, execute the following command to create a new file with links from images:
90
+
Execute the following command to create a new file with links from images:
91
91
92
92
```
93
93
python extract_img_links_from_csv.py
94
94
```
95
95
96
-
Note that by default, the process will create the links from `1k.csv`. Change the `CSV_FILE` variable in `extract_img_links_from_csv.py` if you want to use another dataset.
96
+
Note that by default, the process will create the links from `1k.csv`. Change the `CSV_FILE` variable in `extract_img_links_from_csv.py` if you want to use another data file as input.
97
97
98
-
```
98
+
```python
99
99
CSV_FILE="./1k.csv"
100
100
OUTPUT_FILE="./img_links.txt"
101
101
```
102
102
103
-
The result will be stored in `img_links.txt`.
103
+
The results will be stored in `img_links.txt`.
104
104
105
-
Now, run the following command to download images:
The output will be stored in the folder `imgs`, within the folder `00000`.
111
+
The output will be stored in a sub-folder named `00000` within `imgs`.
112
112
113
113
### Compute Visual CLIP Embeddings
114
114
115
115
Once the folder `imgs` is created and filled with generated images, you can run the following command to compute visual CLIP embeddings for each of them:
116
116
117
117
`python extract_visual_clip_embeddings.py`
118
118
119
-
The following are the main parameters that you might need to change:
119
+
The following are the main parameters that you might need to change from `extract_visual_clip_embeddings.py`:
120
120
```
121
121
IMG_DIR = "./imgs/00000" #directory where all your images were downloaded
122
122
BATCH_SIZE = 128 #number of CLIP embeddings computed at each iterations
123
123
NUM_WORKERS = 14 #number of workers that will run in parallel (recommended is number_of_cores - 2)
124
124
```
125
125
126
-
Once the process is finished, you will see a new folder named `visual_embeddings`. This folder will contain two other folders named `ids` and `embeddings`. `ids` will contain `.npy` files with information of the `ids` of each generation computed at each batch. `embeddings` will contain `.npy` files with the resulting embeddings computed at each batch. This data will be useful for computing the KNN indices, since we need to have information about both, the CLIP embedding and the ID they represent.
126
+
Once the process is finished, you will see a new folder named `visual_embeddings`. This folder will contain two other folders named `ids` and `embeddings`. `ids` will contain `.npy` files with information of the `ids` of each generation computed at each batch. `embeddings` will contain `.npy` files with the resulting embeddings computed at each batch. This data will be useful for computing the KNN indices.
127
127
128
128
### Compute Visual KNN indices
129
129
If you did not make any modifications in the default output structure from the previous step, this process should be as easy as running the following command:
130
130
131
131
`python create_visual_knn_indices.py`
132
132
133
+
Otherwise, you might want to modify the following variables from `create_visual_knn_indices.py`:
134
+
135
+
```python
136
+
INDICES_FOLDER="knn_indices"
137
+
EMBEDDINGS_DIR="visual_embeddings"
138
+
```
139
+
133
140
The result will be stored within a new folder named `knn_indices` in a file named `visual_prompts.index`.
134
141
142
+
135
143
### Search Images
136
144
137
-
In order to make the search of generations more efficient, we will use an ONNX version of CLIP. We will use the implementation from [`CLIP-ONNX`](https://github.com/Lednik7/CLIP-ONNX) for this.
145
+
In order to search generated images more efficiently, we will use an ONNX version of CLIP. We will use the implementation from [`CLIP-ONNX`](https://github.com/Lednik7/CLIP-ONNX) for this.
Now, execute the following line to perform the search with regular CLIP and ONNX CLIP:
157
+
158
+
Finally, execute the following line to perform the search with regular CLIP and ONNX CLIP:
150
159
151
160
```
152
161
python test_visual_knn_index.py
153
162
```
154
163
155
-
The result will be a list of image filenames that are the most similar to the prompt `"image of a blue robot with red background"` and the image `prompt-search.png`.
164
+
The result should be a list of image filenames that are the most similar to the prompt `"image of a blue robot with red background"` and the image `prompt-search.png`.
156
165
157
166
Change the following parameters in `test_visual_knn_index.py` to try out different input prompts and images:
158
167
159
168
```python
160
169
INPUT_IMG_PATH="./prompt-search.png"
161
170
INPUT_PROMPT="image of a blue robot with red background"
162
171
NUM_RESULTS=5
163
-
```
172
+
```
173
+
174
+
Have fun!
175
+
176
+
# Get in touch
177
+
178
+
- Follow and DM us on Twitter: [@krea_ai](https://twitter.com/krea_ai)
0 commit comments