Draft: Adds OpenSearch Embedding Search via Bedrock to suggest.py and adds OpenSearch service module #2646
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.



This PR integrates semantic vector search into our suggest.py flow by creating and using new service Amazon OpenSearch Serverless (AOSS) and AWS Bedrock embeddings.
⚠️ Note: this PR is not ready for merge, it is a proof of conceptWith this change, we now have a hybrid search engine that combines:
Semantic vector similarity via Bedrock → OpenSearch
Traditional keyword search (boosted by title/abstract relevance)
This gives us the best of both worlds: meaning-aware search and precision keyword matching.
🛠 Implementation Notes
Index Mapping Updates
Added embedding field (knn_vector, 1024-dim) for Titan embeddings
Authors stored as keyword for precise aggregation + display
Authentication
Integrated AWS4Auth with AOSS (aoss service target)
Supports IAM users or roles with data access policy
This PR introduces new pydantic schemas for payload validations
opensearch_service includes lazy initialization for better performance and graceful degradation
Three new dependencies added to the pyproject.toml ("pydantic>=2.0.0,<3.0.0", "requests-aws4auth>=1.2.3,<2.0.0", iso8601==1.0.2)
Future implementation:
flowchart TD A[User Query] --> K[Keywords] --> D[BM25 Search] A --> V[Vector Embedding] --> C[OpenSearch k-NN] C --> E[Hybrid Scoring] D --> E E --> F[Final Results]