Skip to content

Support Hugging Face get_top_docs() Style Retrieval for RAG in retrieve_online_documents_v2 #5391

@ntkathole

Description

@ntkathole

Is your feature request related to a problem? Please describe.

Extend retrieve_online_documents_v2() implementation to support integration with Hugging Face’s Transformers-based RagRetriever, which expects a specific method signature and output format via a get_top_docs() method.

https://huggingface.co/docs/transformers/model_doc/rag#transformers.RagRetriever

Add a new method (e.g., get_top_docs) to the Feast or provide a utility class like FeastIndex that wraps retrieve_online_documents_v2 and returns RAG-compatible results.

get_top_docs(query_vectors, n_docs) is expected to:

Input:
query_vectors: A tensor or list of vectors representing one or more queries (usually from a language model like BERT).
n_docs: The number of top documents (e.g., text passages) to retrieve for each query.

Expected Output:

A tuple of:

doc_scores: List[List[float]] # similarity scores
doc_ids: List[List[str]] # string document IDs
docs: List[List[str]] # raw document text

Describe the solution you'd like

index = FeastIndex(
    vector_store=feast_retriever,
    config=config,
    table=feature_view,
    requested_features=["metadata", "source"],
    text_field="document"
)
scores, ids, texts = index.get_top_docs(query_vectors=[embedding], n_docs=5)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions