Skip to content

Conversation

Fiona-Waters
Copy link
Contributor

@Fiona-Waters Fiona-Waters commented Jun 30, 2025

Description

Following on from merging of these PR's (feast-dev/feast#5405, feast-dev/feast#5470) in the feast repo, this PR updates the feast rag example in accordance with these changes.

How Has This Been Tested?

I have tested this manually on the team cluster, by cloning this PR and running the notebook.

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

  • Documentation
    • Updated the README to remove detailed module descriptions and correct a filename reference.
  • Bug Fixes
    • Fixed configuration by updating the Milvus host URL format in the feature store settings.
    • Corrected file paths and import statements in the example notebook to match recent module and directory changes.
  • Chores
    • Removed the custom retriever module to align with upstream changes and simplify maintenance.
    • Updated dependency installation steps and reorganized data storage locations in the example notebook.
    • Revised dataset chunking method to preserve whole words within character limits for improved text processing.

Copy link

coderabbitai bot commented Jun 30, 2025

Walkthrough

This update removes the custom Feast RAG retriever implementation and its supporting classes, refactors the notebook to use Feast’s built-in retriever and vector store modules, and updates configuration and documentation accordingly. The notebook now simplifies dependency installation, reorganizes data paths, adjusts imports and initialization to align with the new Feast integration, and clarifies dataset chunking logic.

Changes

File(s) Change Summary
examples/kfto_feast_rag/feast_rag_retriever.py Entire file deleted. Removed custom classes for Feast-backed RAG retriever, vector store, and index.
examples/kfto_feast_rag/README.md Removed detailed description of the deleted retriever module; updated vector store and retriever descriptions to use Feast’s built-in FeastVectorStore and FeastRAGRetriever; fixed filename from rag_project_repo.py to ragproject_repo.py.
examples/kfto_feast_rag/feature_repo/feature_store.yaml Updated Milvus online store host to include the "http://" prefix.
examples/kfto_feast_rag/rag_feast_kfto.ipynb Simplified dependency installation by removing faiss-cpu; added markdown explaining chunking; rewrote chunking function to avoid word truncation with a 380-character limit; created feature_repo/data/ directory for parquet file storage; changed imports to feast.vector_store and feast.rag_retriever; corrected feature view import path; updated initialization parameters for FeastVectorStore, FeastIndex, and FeastRAGRetriever to match new API signatures.

Poem

A retriever once custom, now gone from the code,
Feast’s own modules now lighten the load.
Imports are tidier, the notebook’s more neat,
Data in new places, the setup’s complete.
With “http://” in configs, the host’s crystal clear—
The rabbits all cheer: “Feast integration is here!”
🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
examples/kfto_feast_rag/README.md (1)

48-48: Fix markdown formatting consistency.

The static analysis tool has identified formatting inconsistencies in this line.

Apply this diff to fix the markdown formatting:

-* **__feature_repo/ragproject_repo.py__**
+- **feature_repo/ragproject_repo.py**
examples/kfto_feast_rag/rag_feast_kfto.ipynb (1)

28-38: Consider pinning to a specific commit for reproducibility.

Installing from the master branch is appropriate to access the latest RAG features. However, consider pinning to a specific commit hash for better reproducibility.

For better reproducibility, consider pinning to a specific commit:

-%pip install git+https://github.com/feast-dev/feast.git@master
+%pip install git+https://github.com/feast-dev/feast.git@<specific_commit_hash>
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 59fac38 and d5e672b.

📒 Files selected for processing (4)
  • examples/kfto_feast_rag/README.md (1 hunks)
  • examples/kfto_feast_rag/feast_rag_retriever.py (0 hunks)
  • examples/kfto_feast_rag/feature_repo/feature_store.yaml (1 hunks)
  • examples/kfto_feast_rag/rag_feast_kfto.ipynb (6 hunks)
💤 Files with no reviewable changes (1)
  • examples/kfto_feast_rag/feast_rag_retriever.py
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
examples/kfto_feast_rag/README.md

48-48: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)


48-48: Strong style
Expected: asterisk; Actual: underscore

(MD050, strong-style)


48-48: Strong style
Expected: asterisk; Actual: underscore

(MD050, strong-style)

🔇 Additional comments (10)
examples/kfto_feast_rag/feature_repo/feature_store.yaml (1)

6-6: LGTM! Proper URL scheme added.

Adding the "http://" prefix to the host configuration improves clarity and aligns with standard URL formatting expectations for the Milvus connection.

examples/kfto_feast_rag/README.md (1)

48-48: Filename correction aligns with notebook imports.

Good catch! This correction ensures consistency with the import path used in the notebook (ragproject_repo).

examples/kfto_feast_rag/rag_feast_kfto.ipynb (8)

23-23: Dependency simplification approved.

Removing faiss-cpu from the installation is appropriate since the updated Feast integration leverages Milvus for vector operations directly.


138-138: Data organization improvement.

Creating a dedicated data directory within feature_repo improves project structure and organization.


166-166: File path updated to match new directory structure.

The updated path correctly uses the new feature_repo/data directory structure.


252-254: Import updates reflect Feast RAG integration.

The updated imports correctly use the new Feast-provided RAG modules instead of the custom implementation. The filename correction (ragproject_repo) maintains consistency with the actual file structure.


267-269: Simplified API initialization with enhanced features.

The updated FeastVectorStore initialization uses a cleaner API with repo_path="." and includes the necessary passage_id feature for proper retrieval functionality.


272-272: Cleaner FeastIndex initialization.

The simplified initialization without the vector_store argument reflects improved API design in the updated Feast integration.


284-285: More explicit retriever configuration.

The updated parameters (feature_view and features) provide clearer and more direct configuration compared to inferring these from a vector store object.


289-289: Text field specification for retrieval.

Adding the text_field="passage_text" parameter is essential for the retriever to correctly identify the text content field for RAG functionality.

@Fiona-Waters Fiona-Waters requested review from ChughShilpa and efazal and removed request for Bobbins228 June 30, 2025 09:37
@Fiona-Waters Fiona-Waters force-pushed the update-feast-example branch from d5e672b to c9e6cff Compare June 30, 2025 10:42
Copy link
Contributor

@sutaakar sutaakar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Jun 30, 2025
@Fiona-Waters Fiona-Waters force-pushed the update-feast-example branch from c9e6cff to 240d412 Compare July 2, 2025 10:56
@openshift-ci openshift-ci bot removed the lgtm label Jul 2, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
examples/kfto_feast_rag/README.md (1)

51-59: README still claims a “custom” vector store / retriever — this is now stale

PR description says the bespoke FeastRAGRetriever has been removed in favour of Feast’s built-in components, yet the bullet list continues to advertise:

- **Vector Store**: Custom implementation with Feast integration
- **Retriever**: Custom implementation with Feast integration extending HuggingFace's RagRetriever
...
- Inference utilising a custom RagRetriever: FeastRagRetriever

Leaving this unchanged will confuse users who try to hunt down code that no longer exists. Update the wording to reflect the built-in Feast retriever/vector-store and remove references to the deleted FeastRAGRetriever.

Example fix:

-   - **Vector Store**: Custom implementation with Feast integration
-   - **Retriever**: Custom implementation with Feast integration extending HuggingFace's RagRetriever
+   - **Vector Store**: Feast’s built-in vector store backed by Milvus
+   - **Retriever**: Feast’s native retriever (no custom subclass needed)
...
-    - Inference utilising a custom RagRetriever: FeastRagRetriever
+    - Inference using Feast’s built-in retriever
🧹 Nitpick comments (1)
examples/kfto_feast_rag/README.md (1)

48-50: Fix Markdown style & remove redundant underscores around filename

The double-underscore trick isn’t needed when the line is already bolded, and markdownlint is complaining about the * list marker.

-* **__feature_repo/ragproject_repo.py__**
+ - **feature_repo/ragproject_repo.py**
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c9e6cff and 240d412.

📒 Files selected for processing (4)
  • examples/kfto_feast_rag/README.md (1 hunks)
  • examples/kfto_feast_rag/feast_rag_retriever.py (0 hunks)
  • examples/kfto_feast_rag/feature_repo/feature_store.yaml (1 hunks)
  • examples/kfto_feast_rag/rag_feast_kfto.ipynb (7 hunks)
💤 Files with no reviewable changes (1)
  • examples/kfto_feast_rag/feast_rag_retriever.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • examples/kfto_feast_rag/feature_repo/feature_store.yaml
  • examples/kfto_feast_rag/rag_feast_kfto.ipynb
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
examples/kfto_feast_rag/README.md

48-48: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)


48-48: Strong style
Expected: asterisk; Actual: underscore

(MD050, strong-style)


48-48: Strong style
Expected: asterisk; Actual: underscore

(MD050, strong-style)

@ChughShilpa
Copy link
Contributor

/lgtm, just a nitpick

RAG System Implementation

Embedding Model: all-MiniLM-L6-v2 (configurable)
Generator Model: granite-3.2-2b-instruct (configurable)
Vector Store: Custom implementation with Feast integration
Retriever: Custom implementation with Feast integration extending HuggingFace's RagRetriever

may be it's good to add name of vector store and retriever in Readme.file

@efazal
Copy link
Contributor

efazal commented Jul 2, 2025

/lgtm

@Fiona-Waters Fiona-Waters force-pushed the update-feast-example branch from 240d412 to 99d68d9 Compare July 2, 2025 11:28
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
examples/kfto_feast_rag/README.md (2)

23-26: Tighten wording & formatting for model / store / retriever bullets

– Minor Markdown polish: wrap class names in back-ticks so they render as code, drop the possessive “Feast’s” (already implied by the names), and keep wording parallel.
– No functional change, but this avoids future lint warnings and improves readability.

-   - **Generator Model**: `granite-3.2-2b-instruct` (configurable) 
-   - **Vector Store**: Feast’s built-in FeastVectorStore backed by Milvus
-   - **Retriever**: Feast’s native rag retriever FeastRAGRetriever
+   - **Generator Model**: `granite-3.2-2b-instruct` (configurable)
+   - **Vector Store**: `FeastVectorStore` (Milvus backend)
+   - **Retriever**: `FeastRAGRetriever`

48-50: Fix filename markup & satisfy markdown-lint

The double underscores around feature_repo/ragproject_repo.py are interpreted as “bold & italics” by Markdown and trigger MD050 / MD004.
Prefer inline-code styling for paths and keep the bullet consistent with the others.

-* **__feature_repo/ragproject_repo.py__**
+* `feature_repo/ragproject_repo.py`
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 240d412 and 99d68d9.

📒 Files selected for processing (4)
  • examples/kfto_feast_rag/README.md (2 hunks)
  • examples/kfto_feast_rag/feast_rag_retriever.py (0 hunks)
  • examples/kfto_feast_rag/feature_repo/feature_store.yaml (1 hunks)
  • examples/kfto_feast_rag/rag_feast_kfto.ipynb (7 hunks)
💤 Files with no reviewable changes (1)
  • examples/kfto_feast_rag/feast_rag_retriever.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • examples/kfto_feast_rag/feature_repo/feature_store.yaml
  • examples/kfto_feast_rag/rag_feast_kfto.ipynb
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
examples/kfto_feast_rag/README.md

48-48: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)


48-48: Strong style
Expected: asterisk; Actual: underscore

(MD050, strong-style)


48-48: Strong style
Expected: asterisk; Actual: underscore

(MD050, strong-style)

🔇 Additional comments (1)
examples/kfto_feast_rag/README.md (1)

54-60: Inconsistent wording: still refers to a “custom RagRetriever”

Earlier you state the example now uses the built-in FeastRAGRetriever; calling it “custom” here is confusing. Please align with the updated wording.

-    - Inference utilising a custom RagRetriever: FeastRagRetriever
+    - Inference utilising the built-in `FeastRAGRetriever`

@Fiona-Waters Fiona-Waters requested a review from sutaakar July 2, 2025 11:43
@kapil27
Copy link
Contributor

kapil27 commented Jul 2, 2025

/lgtm

@abhijeet-dhumal
Copy link
Contributor

/lgtm

Copy link
Contributor

@sutaakar sutaakar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@Fiona-Waters Fiona-Waters force-pushed the update-feast-example branch from 99d68d9 to a77fddf Compare July 2, 2025 15:24
@openshift-ci openshift-ci bot removed the lgtm label Jul 2, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
examples/kfto_feast_rag/README.md (1)

48-48: Fix markdown formatting issues.

Static analysis flagged several formatting inconsistencies on this line that should be addressed for consistency.

Apply this diff to fix the formatting issues:

-* **__feature_repo/ragproject_repo.py__**
+- **feature_repo/ragproject_repo.py**
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 99d68d9 and a77fddf.

📒 Files selected for processing (4)
  • examples/kfto_feast_rag/README.md (2 hunks)
  • examples/kfto_feast_rag/feast_rag_retriever.py (0 hunks)
  • examples/kfto_feast_rag/feature_repo/feature_store.yaml (1 hunks)
  • examples/kfto_feast_rag/rag_feast_kfto.ipynb (7 hunks)
💤 Files with no reviewable changes (1)
  • examples/kfto_feast_rag/feast_rag_retriever.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • examples/kfto_feast_rag/feature_repo/feature_store.yaml
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
examples/kfto_feast_rag/README.md

48-48: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)


48-48: Strong style
Expected: asterisk; Actual: underscore

(MD050, strong-style)


48-48: Strong style
Expected: asterisk; Actual: underscore

(MD050, strong-style)

🔇 Additional comments (10)
examples/kfto_feast_rag/README.md (1)

23-25: Component descriptions accurately reflect the migration to native Feast components.

The updated descriptions correctly document the transition from custom implementations to Feast's built-in components, which aligns with the broader refactoring goals of this PR.

examples/kfto_feast_rag/rag_feast_kfto.ipynb (9)

23-23: Installation simplified appropriately.

Removing the explicit faiss-cpu dependency streamlines the installation process while maintaining functionality through the Feast dependencies.


52-58: Clear explanation of the chunking strategy.

The added markdown cell provides valuable context about the chunking approach, explaining the character limit and word-boundary preservation strategy.


65-96: Improved chunking logic with better word boundary handling.

The new chunking implementation correctly preserves word boundaries and ensures chunks don't exceed the character limit, which is an improvement over the previous approach.


140-147: Good practice to organize data in dedicated directory.

Creating a dedicated data directory within feature_repo improves project organization and separation of concerns.


174-174: Data path updated consistently with new directory structure.

The parquet file path correctly reflects the new feature_repo/data/ directory structure.


260-262: Import statements correctly updated for native Feast components.

The imports now correctly reference the native Feast modules and the corrected feature repository filename, aligning with the migration away from custom implementations.


275-278: FeastVectorStore initialization updated appropriately.

The initialization now uses repo_path="." instead of a store object and includes the passage_id in the features list, which is consistent with the native Feast implementation.


280-280: FeastIndex initialization simplified correctly.

Removing the vector_store argument simplifies the initialization consistent with the native Feast implementation.


292-298: FeastRAGRetriever initialization updated with correct parameters.

The retriever initialization now uses the correct parameters for the native Feast implementation, including the explicit feature_view and features parameters and the new text_field argument.

" if current_chunk_words:\n",
" chunk_text = ' '.join(current_chunk_words)\n",
" all_chunks.append(chunk_text)\n",
" all_ids.append(f\"{examples['id'][i]}_{len(all_chunks)}\") # Unique ID for the chunk\n",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential issue with chunk ID generation.

The chunk ID generation uses len(all_chunks) which includes the current chunk being added, potentially causing inconsistent or duplicate IDs.

Apply this diff to fix the ID generation:

-                    all_ids.append(f"{examples['id'][i]}_{len(all_chunks)}")  # Unique ID for the chunk
+                    all_ids.append(f"{examples['id'][i]}_{len(all_chunks) + 1}")  # Unique ID for the chunk
-            all_ids.append(f"{examples['id'][i]}_{len(all_chunks)}")  # Unique ID for the chunk
+            all_ids.append(f"{examples['id'][i]}_{len(all_chunks) + 1}")  # Unique ID for the chunk

Also applies to: 94-94

🤖 Prompt for AI Agents
In examples/kfto_feast_rag/rag_feast_kfto.ipynb at lines 83 and 94, the chunk ID
generation uses len(all_chunks) which counts the current chunk being added,
risking duplicate or inconsistent IDs. To fix this, replace len(all_chunks) with
the index of the chunk before appending it, ensuring each chunk ID is unique and
consistent by using the correct chunk index value.

Copy link

openshift-ci bot commented Jul 2, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: astefanutti

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Jul 2, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit dd23862 into opendatahub-io:main Jul 2, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants