-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Is your feature request related to a problem? Please describe.
Extend retrieve_online_documents_v2()
implementation to support integration with Hugging Face’s Transformers-based RagRetriever, which expects a specific method signature and output format via a get_top_docs() method.
https://huggingface.co/docs/transformers/model_doc/rag#transformers.RagRetriever
Add a new method (e.g., get_top_docs) to the Feast or provide a utility class like FeastIndex that wraps retrieve_online_documents_v2 and returns RAG-compatible results.
get_top_docs(query_vectors, n_docs
) is expected to:
Input:
query_vectors: A tensor or list of vectors representing one or more queries (usually from a language model like BERT).
n_docs: The number of top documents (e.g., text passages) to retrieve for each query.
Expected Output:
A tuple of:
doc_scores: List[List[float]] # similarity scores
doc_ids: List[List[str]] # string document IDs
docs: List[List[str]] # raw document text
Describe the solution you'd like
index = FeastIndex(
vector_store=feast_retriever,
config=config,
table=feature_view,
requested_features=["metadata", "source"],
text_field="document"
)
scores, ids, texts = index.get_top_docs(query_vectors=[embedding], n_docs=5)