-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: Add retrieve online documents v2 method into pgvector #5253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add retrieve online documents v2 method into pgvector #5253
Conversation
Signed-off-by: yassinnouh21 <[email protected]>
Signed-off-by: yassinnouh21 <[email protected]>
Signed-off-by: yassinnouh21 <[email protected]>
a17a7fa
to
776c327
Compare
sdk/python/feast/infra/online_stores/postgres_online_store/postgres.py
Outdated
Show resolved
Hide resolved
top_k=sql.Literal(top_k), | ||
) | ||
|
||
cur.execute( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cur.execute and cur.fetchall() is repeated in all conditions.
if hybrid search, params = [embedding, tsquery_str, string_fields, tsquery_str]
if vector search, params = [embedding],
.....
cur.execute(query, params)
rows = cur.fetchall()
entities_dict[key]["text_rank"], float(text_rank) | ||
) | ||
|
||
if embedding is not None and query_string is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be simplified ?
def sort_key(item: Dict[str, Any]) -> float:
return item["vector_distance"] if embedding else item["text_rank"]
…d requested features Signed-off-by: Yassin Nouh <[email protected]>
Signed-off-by: Yassin Nouh <[email protected]>
Signed-off-by: Yassin Nouh <[email protected]>
Signed-off-by: Yassin Nouh <[email protected]>
Signed-off-by: Yassin Nouh <[email protected]>
@ntkathole can u take a quick look |
# keep the vector_value_type as BYTEA if pgvector is not enabled, to maintain compatibility | ||
vector_value_type = "BYTEA" | ||
|
||
has_string_features = any( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there's a more explicit way to handle this? Feels like this could be cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do u think this will be a better version
has_string_features = any(
f.dtype.to_value_type() == ValueType.STRING
for f in table.features
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes!
Signed-off-by: Yassin Nouh <[email protected]>
Signed-off-by: Yassin Nouh <[email protected]>
1cc3f0e
to
55dec54
Compare
@franciscojavierarceo done take a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
@franciscojavierarceo I think the reason behind the failed of the ci is this
because it is irrelevant to the pr changed files |
import time | ||
import unittest | ||
from datetime import timedelta | ||
from datetime import datetime, timedelta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i believe this is what broke the integration tests
it's unfortunate naming but previously we imported datetime
the package and now you've imported the datetime
module from datetime
the package, which is what leads to the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for this! need to change the import or update the integration test
Signed-off-by: yassinnouh21 <[email protected]>
@franciscojavierarceo we are ok to merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀🚀🚀
…v#5253) * feat: add online document retrieval with hybrid search capabilities Signed-off-by: yassinnouh21 <[email protected]> * test: add integration tests for hybrid search and document retrieval Signed-off-by: yassinnouh21 <[email protected]> * fix formatting Signed-off-by: yassinnouh21 <[email protected]> * fix: Refactor string_fields assignment to filter features by dtype and requested features Signed-off-by: Yassin Nouh <[email protected]> * fix: improve query execution logic in postgres.py Signed-off-by: Yassin Nouh <[email protected]> * fix linter Signed-off-by: Yassin Nouh <[email protected]> * fix: simplify sorting logic in query execution Signed-off-by: Yassin Nouh <[email protected]> * fix formatting Signed-off-by: Yassin Nouh <[email protected]> * fix: update string feature check to use ValueType enumeration Signed-off-by: Yassin Nouh <[email protected]> * formatting Signed-off-by: Yassin Nouh <[email protected]> * fix datetime Signed-off-by: yassinnouh21 <[email protected]> --------- Signed-off-by: yassinnouh21 <[email protected]> Signed-off-by: Yassin Nouh <[email protected]>
# [0.49.0](v0.48.0...v0.49.0) (2025-04-29) ### Bug Fixes * Adding brackets to unit tests ([c46fea3](c46fea3)) * Adding logic back for a step ([2bb240b](2bb240b)) * Adjustment for unit test action ([a6f78ae](a6f78ae)) * Allow get_historical_features with only On Demand Feature View ([#5256](#5256)) ([0752795](0752795)) * CI adjustment ([3850643](3850643)) * Embed Query configuration breaks when switching between DataFrame and SQL ([#5257](#5257)) ([32375a5](32375a5)) * Fix for proto issue in utils ([1b291b2](1b291b2)) * Fix milvus online_read ([#5233](#5233)) ([4b91f26](4b91f26)) * Fix tests ([431d9b8](431d9b8)) * Fixed Permissions object parameter in example ([#5259](#5259)) ([045c100](045c100)) * Java CI [#12](#12) ([d7e44ac](d7e44ac)) * Java PR [#15](#15) ([a5da3bb](a5da3bb)) * Java PR [#16](#16) ([e0320fe](e0320fe)) * Java PR [#17](#17) ([49da810](49da810)) * Materialization logs ([#5243](#5243)) ([4aa2f49](4aa2f49)) * Moving to custom github action for checking skip tests ([caf312e](caf312e)) * Operator - remove default replicas setting from Feast Deployment ([#5294](#5294)) ([e416d01](e416d01)) * Patch java pr [#14](#14) ([592526c](592526c)) * Patch update for test ([a3e8967](a3e8967)) * Remove conditional from steps ([995307f](995307f)) * Remove misleading HTTP prefix from gRPC endpoints in logs and doc ([#5280](#5280)) ([0ee3a1e](0ee3a1e)) * removing id ([268ade2](268ade2)) * Renaming workflow file ([5f46279](5f46279)) * Resolve `no pq wrapper` import issue ([#5240](#5240)) ([d5906f1](d5906f1)) * Update actions to remove check skip tests ([#5275](#5275)) ([b976f27](b976f27)) * Update docling demo ([446efea](446efea)) * Update java pr [#13](#13) ([fda7db7](fda7db7)) * Update java_pr ([fa138f4](fa138f4)) * Update repo_config.py ([6a59815](6a59815)) * Update unit tests workflow ([06486a0](06486a0)) * Updated docs for docling demo ([768e6cc](768e6cc)) * Updating action for unit tests ([0996c28](0996c28)) * Updating github actions to filter at job level ([0a09622](0a09622)) * Updating Java CI ([c7c3a3c](c7c3a3c)) * Updating java pr to skip tests ([e997dd9](e997dd9)) * Updating workflows ([c66bcd2](c66bcd2)) ### Features * Add date_partition_column_format for spark source ([#5273](#5273)) ([7a61d6f](7a61d6f)) * Add Milvus tutorial with Feast integration ([#5292](#5292)) ([a1388a5](a1388a5)) * Add pgvector tutorial with PostgreSQL integration ([#5290](#5290)) ([bb1cbea](bb1cbea)) * Add ReactFlow visualization for Feast registry metadata ([#5297](#5297)) ([9768970](9768970)) * Add retrieve online documents v2 method into pgvector ([#5253](#5253)) ([6770ee6](6770ee6)) * Compute Engine Initial Implementation ([#5223](#5223)) ([64bdafd](64bdafd)) * Enable write node for compute engine ([#5287](#5287)) ([f9baf97](f9baf97)) * Local compute engine ([#5278](#5278)) ([8e06dfe](8e06dfe)) * Make transform on writes configurable for ingestion ([#5283](#5283)) ([ecad170](ecad170)) * Offline store update pull_all_from_table_or_query to make timestampfield optional ([#5281](#5281)) ([4b94608](4b94608)) * Serialization version 2 deprecation notice ([#5248](#5248)) ([327d99d](327d99d)) * Vector length definition moved to Feature View from Config ([#5289](#5289)) ([d8f1c97](d8f1c97))
…v#5253) * feat: add online document retrieval with hybrid search capabilities Signed-off-by: yassinnouh21 <[email protected]> * test: add integration tests for hybrid search and document retrieval Signed-off-by: yassinnouh21 <[email protected]> * fix formatting Signed-off-by: yassinnouh21 <[email protected]> * fix: Refactor string_fields assignment to filter features by dtype and requested features Signed-off-by: Yassin Nouh <[email protected]> * fix: improve query execution logic in postgres.py Signed-off-by: Yassin Nouh <[email protected]> * fix linter Signed-off-by: Yassin Nouh <[email protected]> * fix: simplify sorting logic in query execution Signed-off-by: Yassin Nouh <[email protected]> * fix formatting Signed-off-by: Yassin Nouh <[email protected]> * fix: update string feature check to use ValueType enumeration Signed-off-by: Yassin Nouh <[email protected]> * formatting Signed-off-by: Yassin Nouh <[email protected]> * fix datetime Signed-off-by: yassinnouh21 <[email protected]> --------- Signed-off-by: yassinnouh21 <[email protected]> Signed-off-by: Yassin Nouh <[email protected]> Signed-off-by: Jacob Weinhold <[email protected]>
# [0.49.0](feast-dev/feast@v0.48.0...v0.49.0) (2025-04-29) ### Bug Fixes * Adding brackets to unit tests ([c46fea3](feast-dev@c46fea3)) * Adding logic back for a step ([2bb240b](feast-dev@2bb240b)) * Adjustment for unit test action ([a6f78ae](feast-dev@a6f78ae)) * Allow get_historical_features with only On Demand Feature View ([feast-dev#5256](feast-dev#5256)) ([0752795](feast-dev@0752795)) * CI adjustment ([3850643](feast-dev@3850643)) * Embed Query configuration breaks when switching between DataFrame and SQL ([feast-dev#5257](feast-dev#5257)) ([32375a5](feast-dev@32375a5)) * Fix for proto issue in utils ([1b291b2](feast-dev@1b291b2)) * Fix milvus online_read ([feast-dev#5233](feast-dev#5233)) ([4b91f26](feast-dev@4b91f26)) * Fix tests ([431d9b8](feast-dev@431d9b8)) * Fixed Permissions object parameter in example ([feast-dev#5259](feast-dev#5259)) ([045c100](feast-dev@045c100)) * Java CI [feast-dev#12](feast-dev#12) ([d7e44ac](feast-dev@d7e44ac)) * Java PR [feast-dev#15](feast-dev#15) ([a5da3bb](feast-dev@a5da3bb)) * Java PR [feast-dev#16](feast-dev#16) ([e0320fe](feast-dev@e0320fe)) * Java PR [feast-dev#17](feast-dev#17) ([49da810](feast-dev@49da810)) * Materialization logs ([feast-dev#5243](feast-dev#5243)) ([4aa2f49](feast-dev@4aa2f49)) * Moving to custom github action for checking skip tests ([caf312e](feast-dev@caf312e)) * Operator - remove default replicas setting from Feast Deployment ([feast-dev#5294](feast-dev#5294)) ([e416d01](feast-dev@e416d01)) * Patch java pr [feast-dev#14](feast-dev#14) ([592526c](feast-dev@592526c)) * Patch update for test ([a3e8967](feast-dev@a3e8967)) * Remove conditional from steps ([995307f](feast-dev@995307f)) * Remove misleading HTTP prefix from gRPC endpoints in logs and doc ([feast-dev#5280](feast-dev#5280)) ([0ee3a1e](feast-dev@0ee3a1e)) * removing id ([268ade2](feast-dev@268ade2)) * Renaming workflow file ([5f46279](feast-dev@5f46279)) * Resolve `no pq wrapper` import issue ([feast-dev#5240](feast-dev#5240)) ([d5906f1](feast-dev@d5906f1)) * Update actions to remove check skip tests ([feast-dev#5275](feast-dev#5275)) ([b976f27](feast-dev@b976f27)) * Update docling demo ([446efea](feast-dev@446efea)) * Update java pr [feast-dev#13](feast-dev#13) ([fda7db7](feast-dev@fda7db7)) * Update java_pr ([fa138f4](feast-dev@fa138f4)) * Update repo_config.py ([6a59815](feast-dev@6a59815)) * Update unit tests workflow ([06486a0](feast-dev@06486a0)) * Updated docs for docling demo ([768e6cc](feast-dev@768e6cc)) * Updating action for unit tests ([0996c28](feast-dev@0996c28)) * Updating github actions to filter at job level ([0a09622](feast-dev@0a09622)) * Updating Java CI ([c7c3a3c](feast-dev@c7c3a3c)) * Updating java pr to skip tests ([e997dd9](feast-dev@e997dd9)) * Updating workflows ([c66bcd2](feast-dev@c66bcd2)) ### Features * Add date_partition_column_format for spark source ([feast-dev#5273](feast-dev#5273)) ([7a61d6f](feast-dev@7a61d6f)) * Add Milvus tutorial with Feast integration ([feast-dev#5292](feast-dev#5292)) ([a1388a5](feast-dev@a1388a5)) * Add pgvector tutorial with PostgreSQL integration ([feast-dev#5290](feast-dev#5290)) ([bb1cbea](feast-dev@bb1cbea)) * Add ReactFlow visualization for Feast registry metadata ([feast-dev#5297](feast-dev#5297)) ([9768970](feast-dev@9768970)) * Add retrieve online documents v2 method into pgvector ([feast-dev#5253](feast-dev#5253)) ([6770ee6](feast-dev@6770ee6)) * Compute Engine Initial Implementation ([feast-dev#5223](feast-dev#5223)) ([64bdafd](feast-dev@64bdafd)) * Enable write node for compute engine ([feast-dev#5287](feast-dev#5287)) ([f9baf97](feast-dev@f9baf97)) * Local compute engine ([feast-dev#5278](feast-dev#5278)) ([8e06dfe](feast-dev@8e06dfe)) * Make transform on writes configurable for ingestion ([feast-dev#5283](feast-dev#5283)) ([ecad170](feast-dev@ecad170)) * Offline store update pull_all_from_table_or_query to make timestampfield optional ([feast-dev#5281](feast-dev#5281)) ([4b94608](feast-dev@4b94608)) * Serialization version 2 deprecation notice ([feast-dev#5248](feast-dev#5248)) ([327d99d](feast-dev@327d99d)) * Vector length definition moved to Feature View from Config ([feast-dev#5289](feast-dev#5289)) ([d8f1c97](feast-dev@d8f1c97)) Signed-off-by: Jacob Weinhold <[email protected]>
What this PR does / why we need it:
This PR enhances the PostgreSQL online store to support hybrid search capabilities, combining both vector similarity search and full-text search.
Specifically:
retrieve_online_documents_v2
function to handle vector-only, text-only, and hybrid cases gracefully.This update supports the broader goal of enabling more intelligent, contextual document retrieval in Feast's online stores.
Which issue(s) this PR fixes:
Fixes #5115
Part of the roadmap to Introduce Feast NLP/LLM Add-On, enabling advanced search capabilities in vector databases.
Misc