Skip to content

Conversation

@shreyashankar
Copy link
Collaborator

This change introduces first-class retrievers powered by LanceDB (OSS) and integrates retrieval-based context into LLM operations. Pipelines can now define retrievers once at the top level, specifying what to index and what to query via clear Jinja phrases (always using input.*). At runtime, any map, extract, reduce, or filter op can attach retriever: name; the op fetches FTS/embedding/hybrid results from LanceDB and augments the prompt with {{ retrieval_context }} (or a safe prepend if omitted). The implementation follows LanceDB’s native FTS and hybrid search patterns, including explicit FTS query mode and RRF-based hybrid reranking, aligning with the docs: https://lancedb.com/docs/search/hybrid-search/.

Overall, this PR adds a RAG primitive without per-op overrides, keeps ops simple, and builds indexes once. It also ships documentation (parameter reference and examples), and tests for FTS, embedding, and hybrid retrieval. Optional install is via the “retrieval” extra (lancedb).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants