Skip to content

Conversation

YassinNouh21
Copy link
Contributor

What this PR does / why we need it:

This PR enhances the PostgreSQL online store to support hybrid search capabilities, combining both vector similarity search and full-text search.

Specifically:

  • Introduces the ability to perform hybrid queries using both embeddings and keyword-based search.
  • Extends retrieve_online_documents_v2 function to handle vector-only, text-only, and hybrid cases gracefully.
  • Improves feature retrieval by dynamically selecting features based on query type (distance, text_rank).
  • Adds comprehensive integration tests to validate:
    • Vector similarity search (L2 and cosine distance)
    • Full-text search
    • Hybrid search (vector + text)
    • Edge cases (non-matching queries, category filtering)

This update supports the broader goal of enabling more intelligent, contextual document retrieval in Feast's online stores.

Which issue(s) this PR fixes:

Fixes #5115
Part of the roadmap to Introduce Feast NLP/LLM Add-On, enabling advanced search capabilities in vector databases.

Misc

@YassinNouh21 YassinNouh21 requested a review from a team as a code owner April 8, 2025 23:18
@YassinNouh21 YassinNouh21 force-pushed the feat/pgvector-retrieve-online-documents-v2 branch from a17a7fa to 776c327 Compare April 8, 2025 23:18
@YassinNouh21 YassinNouh21 changed the title feat: add retrieve online documents v2 method into pgvector feat:Add retrieve online documents v2 method into pgvector Apr 8, 2025
@YassinNouh21 YassinNouh21 changed the title feat:Add retrieve online documents v2 method into pgvector feat: Add retrieve online documents v2 method into pgvector Apr 8, 2025
top_k=sql.Literal(top_k),
)

cur.execute(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cur.execute and cur.fetchall() is repeated in all conditions.

if hybrid search, params = [embedding, tsquery_str, string_fields, tsquery_str]
if vector search, params = [embedding],
.....

cur.execute(query, params)
rows = cur.fetchall()

entities_dict[key]["text_rank"], float(text_rank)
)

if embedding is not None and query_string is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be simplified ?

def sort_key(item: Dict[str, Any]) -> float:
            return item["vector_distance"] if embedding else item["text_rank"]

@YassinNouh21 YassinNouh21 requested a review from ntkathole April 9, 2025 10:04
@YassinNouh21
Copy link
Contributor Author

@ntkathole can u take a quick look

# keep the vector_value_type as BYTEA if pgvector is not enabled, to maintain compatibility
vector_value_type = "BYTEA"

has_string_features = any(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there's a more explicit way to handle this? Feels like this could be cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do u think this will be a better version

has_string_features = any(
                    f.dtype.to_value_type() == ValueType.STRING 
                    for f in table.features
                )

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes!

@YassinNouh21 YassinNouh21 force-pushed the feat/pgvector-retrieve-online-documents-v2 branch from 1cc3f0e to 55dec54 Compare April 9, 2025 16:10
@YassinNouh21
Copy link
Contributor Author

@franciscojavierarceo done take a look

Copy link
Member

@ntkathole ntkathole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@YassinNouh21
Copy link
Contributor Author

YassinNouh21 commented Apr 9, 2025

@franciscojavierarceo I think the reason behind the failed of the ci is this
from line 222 to line 225
at this file in sdk/python/tests/integration/online_store/test_universal_online.py

 sdk/python/tests/integration/online_store/test_universal_online.py
         # writes to online store via datasource (dataframe_source) materialization
         fs.materialize(
-            start_date=datetime.datetime.now() - timedelta(hours=12),
+            start_date=datetime.now() - timedelta(hours=12),
             end_date=_utc_now(),
         )

because it is irrelevant to the pr changed files

import time
import unittest
from datetime import timedelta
from datetime import datetime, timedelta

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe this is what broke the integration tests

it's unfortunate naming but previously we imported datetime the package and now you've imported the datetime module from datetime the package, which is what leads to the issue.

Copy link
Member

@franciscojavierarceo franciscojavierarceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this! need to change the import or update the integration test

Signed-off-by: yassinnouh21 <[email protected]>
@YassinNouh21
Copy link
Contributor Author

@franciscojavierarceo we are ok to merge

Copy link
Member

@franciscojavierarceo franciscojavierarceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀🚀🚀

@franciscojavierarceo franciscojavierarceo merged commit 6770ee6 into feast-dev:master Apr 11, 2025
22 checks passed
@YassinNouh21 YassinNouh21 deleted the feat/pgvector-retrieve-online-documents-v2 branch April 11, 2025 07:16
tchughesiv pushed a commit to tchughesiv/feast that referenced this pull request Apr 14, 2025
…v#5253)

* feat: add online document retrieval with hybrid search capabilities

Signed-off-by: yassinnouh21 <[email protected]>

* test: add integration tests for hybrid search and document retrieval

Signed-off-by: yassinnouh21 <[email protected]>

* fix formatting

Signed-off-by: yassinnouh21 <[email protected]>

* fix: Refactor string_fields assignment to filter features by dtype and requested features

Signed-off-by: Yassin Nouh <[email protected]>

* fix: improve query execution logic in postgres.py

Signed-off-by: Yassin Nouh <[email protected]>

* fix linter

Signed-off-by: Yassin Nouh <[email protected]>

* fix: simplify sorting logic in query execution

Signed-off-by: Yassin Nouh <[email protected]>

* fix formatting

Signed-off-by: Yassin Nouh <[email protected]>

* fix: update string feature check to use ValueType enumeration

Signed-off-by: Yassin Nouh <[email protected]>

* formatting

Signed-off-by: Yassin Nouh <[email protected]>

* fix datetime

Signed-off-by: yassinnouh21 <[email protected]>

---------

Signed-off-by: yassinnouh21 <[email protected]>
Signed-off-by: Yassin Nouh <[email protected]>
franciscojavierarceo pushed a commit that referenced this pull request Apr 29, 2025
# [0.49.0](v0.48.0...v0.49.0) (2025-04-29)

### Bug Fixes

* Adding brackets to unit tests ([c46fea3](c46fea3))
* Adding logic back for a step ([2bb240b](2bb240b))
* Adjustment for unit test action ([a6f78ae](a6f78ae))
* Allow get_historical_features with only On Demand Feature View ([#5256](#5256)) ([0752795](0752795))
* CI adjustment ([3850643](3850643))
* Embed Query configuration breaks when switching between DataFrame and SQL ([#5257](#5257)) ([32375a5](32375a5))
* Fix for proto issue in utils ([1b291b2](1b291b2))
* Fix milvus online_read ([#5233](#5233)) ([4b91f26](4b91f26))
* Fix tests ([431d9b8](431d9b8))
* Fixed Permissions object parameter in example ([#5259](#5259)) ([045c100](045c100))
* Java CI [#12](#12) ([d7e44ac](d7e44ac))
* Java PR [#15](#15) ([a5da3bb](a5da3bb))
* Java PR [#16](#16) ([e0320fe](e0320fe))
* Java PR [#17](#17) ([49da810](49da810))
* Materialization logs ([#5243](#5243)) ([4aa2f49](4aa2f49))
* Moving to custom github action for checking skip tests ([caf312e](caf312e))
* Operator - remove default replicas setting from Feast Deployment ([#5294](#5294)) ([e416d01](e416d01))
* Patch java pr [#14](#14) ([592526c](592526c))
* Patch update for test ([a3e8967](a3e8967))
* Remove conditional from steps ([995307f](995307f))
* Remove misleading HTTP prefix from gRPC endpoints in logs and doc ([#5280](#5280)) ([0ee3a1e](0ee3a1e))
* removing id ([268ade2](268ade2))
* Renaming workflow file ([5f46279](5f46279))
* Resolve `no pq wrapper` import issue ([#5240](#5240)) ([d5906f1](d5906f1))
* Update actions to remove check skip tests ([#5275](#5275)) ([b976f27](b976f27))
* Update docling demo ([446efea](446efea))
* Update java pr [#13](#13) ([fda7db7](fda7db7))
* Update java_pr ([fa138f4](fa138f4))
* Update repo_config.py ([6a59815](6a59815))
* Update unit tests workflow ([06486a0](06486a0))
* Updated docs for docling demo ([768e6cc](768e6cc))
* Updating action for unit tests ([0996c28](0996c28))
* Updating github actions to filter at job level ([0a09622](0a09622))
* Updating Java CI ([c7c3a3c](c7c3a3c))
* Updating java pr to skip tests ([e997dd9](e997dd9))
* Updating workflows ([c66bcd2](c66bcd2))

### Features

* Add date_partition_column_format for spark source ([#5273](#5273)) ([7a61d6f](7a61d6f))
* Add Milvus tutorial with Feast integration ([#5292](#5292)) ([a1388a5](a1388a5))
* Add pgvector tutorial with PostgreSQL integration ([#5290](#5290)) ([bb1cbea](bb1cbea))
* Add ReactFlow visualization for Feast registry metadata ([#5297](#5297)) ([9768970](9768970))
* Add retrieve online documents v2 method into  pgvector  ([#5253](#5253)) ([6770ee6](6770ee6))
* Compute Engine Initial Implementation ([#5223](#5223)) ([64bdafd](64bdafd))
* Enable write node for compute engine ([#5287](#5287)) ([f9baf97](f9baf97))
* Local compute engine ([#5278](#5278)) ([8e06dfe](8e06dfe))
* Make transform on writes configurable for ingestion ([#5283](#5283)) ([ecad170](ecad170))
* Offline store update pull_all_from_table_or_query to make timestampfield optional ([#5281](#5281)) ([4b94608](4b94608))
* Serialization version 2 deprecation notice ([#5248](#5248)) ([327d99d](327d99d))
* Vector length definition moved to Feature View from Config  ([#5289](#5289)) ([d8f1c97](d8f1c97))
j-wine pushed a commit to j-wine/feast that referenced this pull request Jun 7, 2025
…v#5253)

* feat: add online document retrieval with hybrid search capabilities

Signed-off-by: yassinnouh21 <[email protected]>

* test: add integration tests for hybrid search and document retrieval

Signed-off-by: yassinnouh21 <[email protected]>

* fix formatting

Signed-off-by: yassinnouh21 <[email protected]>

* fix: Refactor string_fields assignment to filter features by dtype and requested features

Signed-off-by: Yassin Nouh <[email protected]>

* fix: improve query execution logic in postgres.py

Signed-off-by: Yassin Nouh <[email protected]>

* fix linter

Signed-off-by: Yassin Nouh <[email protected]>

* fix: simplify sorting logic in query execution

Signed-off-by: Yassin Nouh <[email protected]>

* fix formatting

Signed-off-by: Yassin Nouh <[email protected]>

* fix: update string feature check to use ValueType enumeration

Signed-off-by: Yassin Nouh <[email protected]>

* formatting

Signed-off-by: Yassin Nouh <[email protected]>

* fix datetime

Signed-off-by: yassinnouh21 <[email protected]>

---------

Signed-off-by: yassinnouh21 <[email protected]>
Signed-off-by: Yassin Nouh <[email protected]>
Signed-off-by: Jacob Weinhold <[email protected]>
j-wine pushed a commit to j-wine/feast that referenced this pull request Jun 7, 2025
# [0.49.0](feast-dev/feast@v0.48.0...v0.49.0) (2025-04-29)

### Bug Fixes

* Adding brackets to unit tests ([c46fea3](feast-dev@c46fea3))
* Adding logic back for a step ([2bb240b](feast-dev@2bb240b))
* Adjustment for unit test action ([a6f78ae](feast-dev@a6f78ae))
* Allow get_historical_features with only On Demand Feature View ([feast-dev#5256](feast-dev#5256)) ([0752795](feast-dev@0752795))
* CI adjustment ([3850643](feast-dev@3850643))
* Embed Query configuration breaks when switching between DataFrame and SQL ([feast-dev#5257](feast-dev#5257)) ([32375a5](feast-dev@32375a5))
* Fix for proto issue in utils ([1b291b2](feast-dev@1b291b2))
* Fix milvus online_read ([feast-dev#5233](feast-dev#5233)) ([4b91f26](feast-dev@4b91f26))
* Fix tests ([431d9b8](feast-dev@431d9b8))
* Fixed Permissions object parameter in example ([feast-dev#5259](feast-dev#5259)) ([045c100](feast-dev@045c100))
* Java CI [feast-dev#12](feast-dev#12) ([d7e44ac](feast-dev@d7e44ac))
* Java PR [feast-dev#15](feast-dev#15) ([a5da3bb](feast-dev@a5da3bb))
* Java PR [feast-dev#16](feast-dev#16) ([e0320fe](feast-dev@e0320fe))
* Java PR [feast-dev#17](feast-dev#17) ([49da810](feast-dev@49da810))
* Materialization logs ([feast-dev#5243](feast-dev#5243)) ([4aa2f49](feast-dev@4aa2f49))
* Moving to custom github action for checking skip tests ([caf312e](feast-dev@caf312e))
* Operator - remove default replicas setting from Feast Deployment ([feast-dev#5294](feast-dev#5294)) ([e416d01](feast-dev@e416d01))
* Patch java pr [feast-dev#14](feast-dev#14) ([592526c](feast-dev@592526c))
* Patch update for test ([a3e8967](feast-dev@a3e8967))
* Remove conditional from steps ([995307f](feast-dev@995307f))
* Remove misleading HTTP prefix from gRPC endpoints in logs and doc ([feast-dev#5280](feast-dev#5280)) ([0ee3a1e](feast-dev@0ee3a1e))
* removing id ([268ade2](feast-dev@268ade2))
* Renaming workflow file ([5f46279](feast-dev@5f46279))
* Resolve `no pq wrapper` import issue ([feast-dev#5240](feast-dev#5240)) ([d5906f1](feast-dev@d5906f1))
* Update actions to remove check skip tests ([feast-dev#5275](feast-dev#5275)) ([b976f27](feast-dev@b976f27))
* Update docling demo ([446efea](feast-dev@446efea))
* Update java pr [feast-dev#13](feast-dev#13) ([fda7db7](feast-dev@fda7db7))
* Update java_pr ([fa138f4](feast-dev@fa138f4))
* Update repo_config.py ([6a59815](feast-dev@6a59815))
* Update unit tests workflow ([06486a0](feast-dev@06486a0))
* Updated docs for docling demo ([768e6cc](feast-dev@768e6cc))
* Updating action for unit tests ([0996c28](feast-dev@0996c28))
* Updating github actions to filter at job level ([0a09622](feast-dev@0a09622))
* Updating Java CI ([c7c3a3c](feast-dev@c7c3a3c))
* Updating java pr to skip tests ([e997dd9](feast-dev@e997dd9))
* Updating workflows ([c66bcd2](feast-dev@c66bcd2))

### Features

* Add date_partition_column_format for spark source ([feast-dev#5273](feast-dev#5273)) ([7a61d6f](feast-dev@7a61d6f))
* Add Milvus tutorial with Feast integration ([feast-dev#5292](feast-dev#5292)) ([a1388a5](feast-dev@a1388a5))
* Add pgvector tutorial with PostgreSQL integration ([feast-dev#5290](feast-dev#5290)) ([bb1cbea](feast-dev@bb1cbea))
* Add ReactFlow visualization for Feast registry metadata ([feast-dev#5297](feast-dev#5297)) ([9768970](feast-dev@9768970))
* Add retrieve online documents v2 method into  pgvector  ([feast-dev#5253](feast-dev#5253)) ([6770ee6](feast-dev@6770ee6))
* Compute Engine Initial Implementation ([feast-dev#5223](feast-dev#5223)) ([64bdafd](feast-dev@64bdafd))
* Enable write node for compute engine ([feast-dev#5287](feast-dev#5287)) ([f9baf97](feast-dev@f9baf97))
* Local compute engine ([feast-dev#5278](feast-dev#5278)) ([8e06dfe](feast-dev@8e06dfe))
* Make transform on writes configurable for ingestion ([feast-dev#5283](feast-dev#5283)) ([ecad170](feast-dev@ecad170))
* Offline store update pull_all_from_table_or_query to make timestampfield optional ([feast-dev#5281](feast-dev#5281)) ([4b94608](feast-dev@4b94608))
* Serialization version 2 deprecation notice ([feast-dev#5248](feast-dev#5248)) ([327d99d](feast-dev@327d99d))
* Vector length definition moved to Feature View from Config  ([feast-dev#5289](feast-dev#5289)) ([d8f1c97](feast-dev@d8f1c97))

Signed-off-by: Jacob Weinhold <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update Elastic Search, QDrant, and PGVector to retrieve_online_documents_v2 method

3 participants