Skip to content

Conversation

@syqs
Copy link
Contributor

@syqs syqs commented Oct 2, 2025

This PR integrates semantic vector search into our suggest.py flow by creating and using new service Amazon OpenSearch Serverless (AOSS) and AWS Bedrock embeddings.

⚠️ Note: this PR is not ready for merge, it is a proof of concept

flowchart  TD
A[User types a query] --> V[Vector Embedding via Bedrock Titan] --> C[OpenSearch k-NN] -->R[Normalize and sort results] --> D[Display in suggestion results dropdown]
Loading

With this change, we now have a hybrid search engine that combines:
Semantic vector similarity via Bedrock → OpenSearch
Traditional keyword search (boosted by title/abstract relevance)
This gives us the best of both worlds: meaning-aware search and precision keyword matching.

🛠 Implementation Notes

Index Mapping Updates

Added embedding field (knn_vector, 1024-dim) for Titan embeddings
Authors stored as keyword for precise aggregation + display

Authentication

Integrated AWS4Auth with AOSS (aoss service target)
Supports IAM users or roles with data access policy

This PR introduces new pydantic schemas for payload validations

opensearch_service includes lazy initialization for better performance and graceful degradation
Three new dependencies added to the pyproject.toml ("pydantic>=2.0.0,<3.0.0", "requests-aws4auth>=1.2.3,<2.0.0", iso8601==1.0.2)

Future implementation:

flowchart TD
    A[User Query] --> K[Keywords] --> D[BM25 Search]
    A --> V[Vector Embedding] --> C[OpenSearch k-NN]
    C --> E[Hybrid Scoring]
    D --> E
    E --> F[Final Results]
Loading

…zy initialization for performance and graceful degradation - showcases using aws bedrock embedings and vector search via aws OpenSearch
@sonarqubecloud
Copy link

sonarqubecloud bot commented Oct 2, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant