Skip to content

Commit 569404b

Browse files
feat: Adding Docling RAG demo (#5109)
* feat: Adding Docling RAG demo Signed-off-by: Francisco Javier Arceo <[email protected]> * updated demo Signed-off-by: Francisco Javier Arceo <[email protected]> * cleaned up notebook Signed-off-by: Francisco Javier Arceo <[email protected]> * adding chunk id Signed-off-by: Francisco Javier Arceo <[email protected]> * adding quickstart demo that is WIP and updating docling-demo to export unique chunk-id Signed-off-by: Francisco Javier Arceo <[email protected]> * adding current tentative exmaple repo Signed-off-by: Francisco Javier Arceo <[email protected]> * adding current temporary work Signed-off-by: Francisco Javier Arceo <[email protected]> * updating demo script to rename things Signed-off-by: Francisco Javier Arceo <[email protected]> * updated quickstart Signed-off-by: Francisco Javier Arceo <[email protected]> * added comment Signed-off-by: Francisco Javier Arceo <[email protected]> * checking in progress Signed-off-by: Francisco Javier Arceo <[email protected]> * checking in progress for now, still have some issues with vector retrieval Signed-off-by: Francisco Javier Arceo <[email protected]> * okay think i have most things working Signed-off-by: Francisco Javier Arceo <[email protected]> * removing commenting and unnecessary code Signed-off-by: Francisco Javier Arceo <[email protected]> * uploading demo Signed-off-by: Francisco Javier Arceo <[email protected]> * uploading other files Signed-off-by: Francisco Javier Arceo <[email protected]> * updated repo exaxmple Signed-off-by: Francisco Javier Arceo <[email protected]> * checking in current notebook, almost there Signed-off-by: Francisco Javier Arceo <[email protected]> * fixed linter Signed-off-by: Francisco Javier Arceo <[email protected]> * fixed transformation logic: Signed-off-by: Francisco Javier Arceo <[email protected]> * removed print Signed-off-by: Francisco Javier Arceo <[email protected]> * added README with description Signed-off-by: Francisco Javier Arceo <[email protected]> * removing print Signed-off-by: Francisco Javier Arceo <[email protected]> * updating Signed-off-by: Francisco Javier Arceo <[email protected]> * updating metadata file Signed-off-by: Francisco Javier Arceo <[email protected]> * updated readme and adding dataset Signed-off-by: Francisco Javier Arceo <[email protected]> --------- Signed-off-by: Francisco Javier Arceo <[email protected]>
1 parent 955521a commit 569404b

File tree

6 files changed

+2234
-0
lines changed

6 files changed

+2234
-0
lines changed

examples/rag-docling/README.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# 🚀 Quickstart: RAG, Milvus, and Docling with Feast
2+
3+
This project demonstrates how to use **Feast** to power a **Retrieval-Augmented Generation (RAG)** application.
4+
5+
In particular, this example expands on the basic RAG demo to show:
6+
1. How to transform PDFs into text data with [Docling](https://docling-project.github.io/docling/) that can be used by LLMs
7+
2. How to use [Milvus](https://milvus.io/) as a vector database to store and retrieve embeddings for RAG
8+
3. How to transform PDFs with Docling during ingestion
9+
10+
## 💡 Why Use Feast for RAG?
11+
12+
- **Online retrieval of features:** Ensure real-time access to precomputed document embeddings and other structured data.
13+
- **Declarative feature definitions:** Define feature views and entities in a Python file and empower Data Scientists to easily ship scalabe RAG applications with all of the existing benefits of Feast.
14+
- **Vector search:** Leverage Feast’s integration with vector databases like **Milvus** to find relevant documents based on a similarity metric (e.g., cosine).
15+
- **Structured and unstructured context:** Retrieve both embeddings and traditional features, injecting richer context into LLM prompts.
16+
- **Versioning and reusability:** Collaborate across teams with discoverable, versioned feature transformations.
17+
18+
---
19+
20+
## 📂 Project Structure
21+
22+
- **`data/`**: Contains the demo data, including Wikipedia summaries of cities with sentence embeddings stored in a Parquet file.
23+
- Note, you ahave to use the docling-demo.ipynb to construct the `docling_samples.parquet` file, the `metadata_samples.parquet` file are provided for you.
24+
- **`example_repo.py`**: Defines the feature views and entity configurations for Feast.
25+
- **`feature_store.yaml`**: Configures the offline and online stores (using local files and Milvus Lite in this demo).
26+
27+
The project has two main notebooks:
28+
1. [`docling-demo.ipynb`](./docling-demo.ipynb): Demonstrates how to use Docling to extract text from PDFs and store the text in a Parquet file.
29+
2. [`docling-quickstart.ipynb`](./docling-quickstart.ipynb): Shows how to use Feast to ingest the text data and store and retrieve it from the online store.

0 commit comments

Comments
 (0)