|
| 1 | +# 🚀 Quickstart: RAG, Milvus, and Docling with Feast |
| 2 | + |
| 3 | +This project demonstrates how to use **Feast** to power a **Retrieval-Augmented Generation (RAG)** application. |
| 4 | + |
| 5 | +In particular, this example expands on the basic RAG demo to show: |
| 6 | +1. How to transform PDFs into text data with [Docling](https://docling-project.github.io/docling/) that can be used by LLMs |
| 7 | +2. How to use [Milvus](https://milvus.io/) as a vector database to store and retrieve embeddings for RAG |
| 8 | +3. How to transform PDFs with Docling during ingestion |
| 9 | + |
| 10 | +## 💡 Why Use Feast for RAG? |
| 11 | + |
| 12 | +- **Online retrieval of features:** Ensure real-time access to precomputed document embeddings and other structured data. |
| 13 | +- **Declarative feature definitions:** Define feature views and entities in a Python file and empower Data Scientists to easily ship scalabe RAG applications with all of the existing benefits of Feast. |
| 14 | +- **Vector search:** Leverage Feast’s integration with vector databases like **Milvus** to find relevant documents based on a similarity metric (e.g., cosine). |
| 15 | +- **Structured and unstructured context:** Retrieve both embeddings and traditional features, injecting richer context into LLM prompts. |
| 16 | +- **Versioning and reusability:** Collaborate across teams with discoverable, versioned feature transformations. |
| 17 | + |
| 18 | +--- |
| 19 | + |
| 20 | +## 📂 Project Structure |
| 21 | + |
| 22 | +- **`data/`**: Contains the demo data, including Wikipedia summaries of cities with sentence embeddings stored in a Parquet file. |
| 23 | + - Note, you ahave to use the docling-demo.ipynb to construct the `docling_samples.parquet` file, the `metadata_samples.parquet` file are provided for you. |
| 24 | +- **`example_repo.py`**: Defines the feature views and entity configurations for Feast. |
| 25 | +- **`feature_store.yaml`**: Configures the offline and online stores (using local files and Milvus Lite in this demo). |
| 26 | + |
| 27 | +The project has two main notebooks: |
| 28 | +1. [`docling-demo.ipynb`](./docling-demo.ipynb): Demonstrates how to use Docling to extract text from PDFs and store the text in a Parquet file. |
| 29 | +2. [`docling-quickstart.ipynb`](./docling-quickstart.ipynb): Shows how to use Feast to ingest the text data and store and retrieve it from the online store. |
0 commit comments