-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: Support aggregation in odfv #5666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: hao-xu5 <[email protected]>
Signed-off-by: hao-xu5 <[email protected]>
Signed-off-by: hao-xu5 <[email protected]>
Signed-off-by: hao-xu5 <[email protected]>
| if odfv.mode == "python": | ||
| # Apply aggregations if configured. | ||
| if odfv.aggregations: | ||
| if odfv.mode == "python": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently if this is a python input, we don't have aggregation function to use OOTD, so it use the pandas Dataframe backend to aggregate. I think in long term we need to create a real time compute engine, where we support python native compute operations such as aggregations, filtering, joining etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does OOTD mean ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OOTB :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds aggregation support to OnDemandFeatureView (ODFV) and refactors the backend dataframe module structure. The implementation allows ODFVs to apply aggregations during online feature serving, either using aggregations or feature transformations (but not both). The backend modules are moved from the local directory to a parent-level backends directory to enable reusability across different compute engines like Spark.
Key Changes:
- Moved backend dataframe modules from
local/backendsto parent-levelbackendsdirectory - Added aggregation support to ODFV with
_apply_aggregations_to_responseutility function - Aggregations and transformations are mutually exclusive in ODFVs to simplify schema inference
Reviewed Changes
Copilot reviewed 8 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/python/tests/unit/test_on_demand_feature_view_aggregation.py | New test file covering aggregation functionality in Python and Pandas modes |
| sdk/python/feast/utils.py | Added aggregation utility functions and integrated aggregation logic into ODFV transformation pipeline |
| sdk/python/feast/infra/compute_engines/local/nodes.py | Updated import path for DataFrameBackend |
| sdk/python/feast/infra/compute_engines/local/feature_builder.py | Updated import path for DataFrameBackend |
| sdk/python/feast/infra/compute_engines/local/compute.py | Updated import paths for DataFrameBackend and BackendFactory |
| sdk/python/feast/infra/compute_engines/backends/polars_backend.py | Updated import path for base DataFrameBackend |
| sdk/python/feast/infra/compute_engines/backends/pandas_backend.py | Updated import path for base DataFrameBackend |
| sdk/python/feast/infra/compute_engines/backends/factory.py | Updated import paths for backend classes |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Co-authored-by: Copilot <[email protected]>
Signed-off-by: hao-xu5 <[email protected]>
Signed-off-by: hao-xu5 <[email protected]>
Signed-off-by: hao-xu5 <[email protected]>
Signed-off-by: hao-xu5 <[email protected]>
ntkathole
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you
# [0.56.0](v0.55.0...v0.56.0) (2025-10-27) ### Bug Fixes * Add mode field to Transformation proto for proper serialization ([2390d2e](2390d2e)) * Date wise remote offline store historical data retrieval ([#5686](#5686)) ([949ba3d](949ba3d)) * Fix STRING type handling in on-demand feature views ([#5669](#5669)) ([dfbb743](dfbb743)) * Fixed torch install issue in CI ([366e5a8](366e5a8)) * ODFV not getting counted in resource count ([1d640b6](1d640b6)) * Skip tag updates if user do not have permissions ([#5673](#5673)) ([0a951ce](0a951ce)) ### Features * Add document of Go feature server. ([#5697](#5697)) ([cbd1dde](cbd1dde)) * Add flexible commandArgs support for complete Feast CLI control ([#5678](#5678)) ([6414924](6414924)) * Add HDFS as a feature registry ([#5655](#5655)) ([4c65872](4c65872)) * Add nodeSelector to service config ([#5675](#5675)) ([9728cde](9728cde)) * Add OTEL based observability to the Go Feature Server ([#5685](#5685)) ([f4afdad](f4afdad)) * Added health endpoint for the UI ([#5665](#5665)) ([3aec5d5](3aec5d5)) * Added kuberay support ([e0b698d](e0b698d)) * Added support for filtering multi-projects ([#5688](#5688)) ([eb0a86e](eb0a86e)) * Batch Embedding at scale for RAG with Ray ([cc2a46d](cc2a46d)) * Optimize SQL entity handling without creating temporary tables ([#5695](#5695)) ([aa2c838](aa2c838)) * Support aggregation in odfv ([#5666](#5666)) ([564e965](564e965)) * Support cache_mode for registries ([021e9ea](021e9ea))
What this PR does / why we need it:
local, as it doesn't have to be local abstraction, and spark can inherit it as well.aggregationsconfig into the OnDemandFeatureView. When transforming the features, it can run eitheraggregationsorfeature_transform, not both. The reason is to make the the schema inference simple. But in future this can be redesigned to support both.Which issue(s) this PR fixes:
Misc