Feat: Add first-class LanceDB retrievers and prompt augmentation across LLM operators #460
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This change introduces first-class retrievers powered by LanceDB (OSS) and integrates retrieval-based context into LLM operations. Pipelines can now define retrievers once at the top level, specifying what to index and what to query via clear Jinja phrases (always using input.*). At runtime, any map, extract, reduce, or filter op can attach
retriever: name; the op fetches FTS/embedding/hybrid results from LanceDB and augments the prompt with {{ retrieval_context }} (or a safe prepend if omitted). The implementation follows LanceDB’s native FTS and hybrid search patterns, including explicit FTS query mode and RRF-based hybrid reranking, aligning with the docs: https://lancedb.com/docs/search/hybrid-search/.Overall, this PR adds a RAG primitive without per-op overrides, keeps ops simple, and builds indexes once. It also ships documentation (parameter reference and examples), and tests for FTS, embedding, and hybrid retrieval. Optional install is via the “retrieval” extra (lancedb).