update readme

vipermu · vipermu · commit 4216c03412f9 · 2022-09-22T17:40:35.000+02:00
diff --git a/README.md b/README.md
@@ -25,7 +25,7 @@
 <!-- ABOUT THE PROJECT -->
 # About
 
-This repository contain the implementations we developed to build a semantic search engine with CLIP. Our code is heavily inspired by [clip-retrieval](https://github.com/rom1504/clip-retrieval/), [autofaiss](https://github.com/criteo/autofaiss), and [CLIP-ONNX](https://github.com/Lednik7/CLIP-ONNX). We keept our implementation simple, focused on working with data from [https://github.com/krea-ai/open-prompts](open-prompts), and prepared to run efficiently on a CPU.
+This is the code that we used build our CLIP semantic search engine for [krea.ai](https://krea.ai). This work heavily inspired by [clip-retrieval](https://github.com/rom1504/clip-retrieval/), [autofaiss](https://github.com/criteo/autofaiss), and [CLIP-ONNX](https://github.com/Lednik7/CLIP-ONNX). We keept our implementation simple, focused on working with data from [open-prompts](https://github.com/krea-ai/open-prompts), and prepared to run efficiently on a CPU.
 
 # CLIP Search
 
@@ -36,16 +36,16 @@ If you are not familiar with CLIP, we would recommend starting with the [blog](h
 
 CLIP is a multi-modal neural network that can encode both, images and text in a common feature space. This means that we can create vectors that contain semantic information extracted from a text or an image. We can use these semantic vectors to compute operations such as cosine similarity, which would give us a similarity score. 
 
-As a high level example, when CLIP extracts features from an image with a red car, it produces a similar vector to the one that it creates when extracting the features from the text "a red car", or the image from another red car, since the semantics in all these elements are related.
+As a high level example, when CLIP extracts features from an image with a red car, it produces a similar vector to the one that it creates when it sees the text "a red car", or an image from another red car—since the semantics in all these elements are related.
 
 So far, CLIP has been helpful for creating datasets like [LAION-5B](https://laion.ai/blog/laion-5b/), guiding generative models like VQ-GAN, for image classification tasks where there is not a lot of labeled data, or as a backbone for AI models like Stable Diffusion.
 
 ## Semantic Search
-Given a piece of data such as an image or a text description, semantic search consists of finding similar items within a dataset by comparing feature vectors. These feature vectors are also known as embeddings, and they can be computed in different ways. CLIP is one of the most interesting models for extracting features for semantic search. 
+Semantic search consists of finding similar items within a dataset by comparing feature vectors. These feature vectors are also known as embeddings, and they can be computed in different ways. CLIP is one of the most interesting models for extracting features for semantic search. 
 
-The search process consists of encoding items as embeddings, indexing them, and using these indices for fast search in order to build semantic systems. Romain Beaumont wrote a great [medium post](https://rom1504.medium.com/semantic-search-with-embeddings-index-anything-8fb18556443c) about semantic search, we highly recommend reading it.
+The search process consists of encoding items as embeddings, indexing them, and using these indices for fast search. Romain Beaumont wrote a great [medium post](https://rom1504.medium.com/semantic-search-with-embeddings-index-anything-8fb18556443c) about semantic search, we highly recommend reading it.
 
-With the code in this project you can compute embeddings using CLIP, index them using K-Nearest Neighbors, and search for similarities efficiently given an input CLIP embedding. Note that for the input embedding we will be able to use a vector computed from a text or an image, and we can also index CLIP embeddings from both, images and texts.
+With this code, you will compute embeddings using CLIP, index them using K-Nearest Neighbors, and search for similarities efficiently given an input CLIP embedding.
 
 # Environment
 
@@ -67,7 +67,7 @@ Create a new conda environment with the following command:
 
 # Data Preparation
 
-We will use a dataset of prompts generated with stable diffusion from [open-prompts](https://github.com/krea-ai/open-prompts). `1.csv` contains a subset of the dataset with 1000 elements, we recommend using it the first time you run the code to confirm that everything worked well without having to wait for millions items to be downloaded and processed. You can get it from [here](https://github.com/krea-ai/open-prompts/blob/main/data/1k.csv). It has the same structure as `prompts.csv`, which contains the whole dataset. `prompts.csv` can be downloadedf from [here](https://drive.google.com/file/d/1c4WHxtlzvHYd0UY5WCMJNn2EO-Aiv2A0/view).
+We will use a dataset of prompts generated with stable diffusion from [open-prompts](https://github.com/krea-ai/open-prompts). `1k.csv` is a subset from a [larger dataset](https://drive.google.com/file/d/1c4WHxtlzvHYd0UY5WCMJNn2EO-Aiv2A0/view) that you can find there—perfect for testing! 
 
 # CLIP Search
 
@@ -85,79 +85,96 @@ First, *sign in* or *sign up* to [Lambda](https://lambdalabs.com/cloud/entrance)
 
 ## Search Images
 ### Download images
-The first step will consist of downloading the images from the `csv` file that we downloaded. To do so, we will leverage the [img2dataset](https://github.com/rom1504/img2dataset) package.
+The first step will consist of downloading the images from the `csv` file that contains all the prompt data. To do so, we will leverage the [img2dataset](https://github.com/rom1504/img2dataset) package.
 
-First, execute the following command to create a new file with links from images:
+Execute the following command to create a new file with links from images:
 
 ```
 python extract_img_links_from_csv.py
 ```
 
-Note that by default, the process will create the links from `1k.csv`. Change the `CSV_FILE` variable in `extract_img_links_from_csv.py` if you want to use another dataset.
+Note that by default, the process will create the links from `1k.csv`. Change the `CSV_FILE` variable in `extract_img_links_from_csv.py` if you want to use another data file as input.
 
-```
+```python
 CSV_FILE = "./1k.csv"
 OUTPUT_FILE = "./img_links.txt"
 ```
 
-The result will be stored in `img_links.txt`. 
+The results will be stored in `img_links.txt`. 
 
-Now, run the following command to download images:
+Run the following command to download images:
 
 ```bash
 img2dataset --url_list img_links.txt --output_folder imgs --thread_count=64 --image_size=256
 ```
 
-The output will be stored in the folder `imgs`, within the folder `00000`.
+The output will be stored in a sub-folder named `00000` within `imgs`.
 
 ### Compute Visual CLIP Embeddings
 
 Once the folder `imgs` is created and filled with generated images, you can run the following command to compute visual CLIP embeddings for each of them:
 
 `python extract_visual_clip_embeddings.py`
 
-The following are the main parameters that you might need to change: 
+The following are the main parameters that you might need to change from `extract_visual_clip_embeddings.py`:
 ```
 IMG_DIR = "./imgs/00000" #directory where all your images were downloaded
 BATCH_SIZE = 128 #number of CLIP embeddings computed at each iterations
 NUM_WORKERS = 14 #number of workers that will run in parallel (recommended is number_of_cores - 2)
 ```
 
-Once the process is finished, you will see a new folder named `visual_embeddings`. This folder will contain two other folders named `ids` and `embeddings`. `ids` will contain `.npy` files with information of the `ids` of each generation computed at each batch. `embeddings` will contain `.npy` files with the resulting embeddings computed at each batch. This data will be useful for computing the KNN indices, since we need to have information about both, the CLIP embedding and the ID they represent.
+Once the process is finished, you will see a new folder named `visual_embeddings`. This folder will contain two other folders named `ids` and `embeddings`. `ids` will contain `.npy` files with information of the `ids` of each generation computed at each batch. `embeddings` will contain `.npy` files with the resulting embeddings computed at each batch. This data will be useful for computing the KNN indices.
 
 ### Compute Visual KNN indices
 If you did not make any modifications in the default output structure from the previous step, this process should be as easy as running the following command:
 
 `python create_visual_knn_indices.py`
 
+Otherwise, you might want to modify the following variables from `create_visual_knn_indices.py`:
+
+```python
+INDICES_FOLDER = "knn_indices"
+EMBEDDINGS_DIR = "visual_embeddings"
+```
+
 The result will be stored within a new folder named `knn_indices` in a file named `visual_prompts.index`.
 
+
 ### Search Images 
 
-In order to make the search of generations more efficient, we will use an ONNX version of CLIP. We will use the implementation from [`CLIP-ONNX`](https://github.com/Lednik7/CLIP-ONNX) for this.
+In order to search generated images more efficiently, we will use an ONNX version of CLIP. We will use the implementation from [`CLIP-ONNX`](https://github.com/Lednik7/CLIP-ONNX) for this.
 
 Install the following package:
 ```bash
 pip install git+https://github.com/Lednik7/CLIP-ONNX.git --no-deps
 ```
 
-Once installed, download the ONNX CLIP models with the following command:
+Once installed, download the ONNX CLIP models with the following commands:
 ```
 wget https://clip-as-service.s3.us-east-2.amazonaws.com/models/onnx/ViT-B-32/visual.onnx
 wget https://clip-as-service.s3.us-east-2.amazonaws.com/models/onnx/ViT-B-32/textual.onnx
 ```
-Now, execute the following line to perform the search with regular CLIP and ONNX CLIP:
+
+Finally, execute the following line to perform the search with regular CLIP and ONNX CLIP:
 
 ```
 python test_visual_knn_index.py
 ```
 
-The result will be a list of image filenames that are the most similar to the prompt `"image of a blue robot with red background"` and the image `prompt-search.png`.
+The result should be a list of image filenames that are the most similar to the prompt `"image of a blue robot with red background"` and the image `prompt-search.png`.
 
 Change the following parameters in `test_visual_knn_index.py` to try out different input prompts and images:
 
 ```python
 INPUT_IMG_PATH = "./prompt-search.png"
 INPUT_PROMPT = "image of a blue robot with red background"
 NUM_RESULTS = 5    
-```
+```
+
+Have fun!
+
+# Get in touch
+
+- Follow and DM us on Twitter: [@krea_ai](https://twitter.com/krea_ai)
+- Join [our Discord community](https://discord.gg/3mkFbvPYut)
+- Email either `v` or `d` (`v` at `krea` dot `ai`; `d` at `krea` dot `ai` respectively)