-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix: FeatureView serialization with cycle detection #5502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds support for on-demand feature sources, enforces cycle detection in FeatureView (de)serialization, standardizes the topological sort API, and refreshes related documentation.
- Introduce
OnDemandSourceType
alias and refactorsources
signature inOnDemandFeatureView
. - Rename all
topo_sort
functions/methods totopological_sort
and update callers. - Enhance
FeatureView.to_proto
/from_proto
with cycle detection and update__copy__
/__eq__
. - Add DAG module README and update compute-engine reference docs.
Reviewed Changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
sdk/python/feast/on_demand_feature_view.py | Add OnDemandSourceType alias and clean up sources hints |
sdk/python/feast/infra/compute_engines/feature_resolver.py | Rename resolver method from topo_sort to topological_sort |
sdk/python/feast/infra/compute_engines/feature_builder.py | Update builder calls to topological_sort |
sdk/python/feast/infra/compute_engines/algorithms/topo.py | Rename functions topo_sort[_multiple] to topological_sort[_multiple] |
sdk/python/feast/infra/compute_engines/dag/README.md | Add high-level DAG documentation |
sdk/python/feast/feature_view.py | Implement cycle detection in (de)serialization, update copy/eq |
docs/reference/compute-engine/README.md | Refresh compute-engine table and links |
docs/getting-started/concepts/batch-feature-view.md | New guide for BatchFeatureView |
Comments suppressed due to low confidence (1)
docs/reference/compute-engine/README.md:22
- The markdown link for
ExecutionPlan
is nested as[link]([link](...))
. It should be formatted as a single link, e.g.[link](URL)
.
| `ExecutionPlan` | Executes nodes in dependency order and stores intermediate outputs | [link]([link](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/infra/compute_engines/dag/README.md)) |
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
for feature in on_demand_feature_view_proto.spec.features | ||
], | ||
sources=sources, | ||
sources=cast(List[OnDemandSourceType], sources), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow this was breaking mypy, so put a fix on this.
## ✅ Key Capabilities | ||
|
||
- **Composable DAG of FeatureViews**: Supports defining a `BatchFeatureView` on top of one or more other `FeatureView`s. | ||
- **Transformations**: Apply PySpark-based transformation logic (`feature_transformation` or `udf`) to raw data source, can also be used to deal with multiple data sources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this exclusively limited to PySpark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not exclusively to Pyspark, will update it
*, | ||
name: str, | ||
source: Union[DataSource, FeatureView, List[FeatureView]], | ||
sink_source: Optional[DataSource] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for some reason i thought we agreed on naming it sink.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, on a second thought, I think sink_source
is more explicit for the user to know it is passing a data source to this config.
Field(name="conv_rate", dtype=Float32), | ||
], | ||
aggregations=[ | ||
Aggregation(column="conv_rate", function="sum", time_window=timedelta(days=1)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe we should make it avg since summing a conversion rate is weird
## Feature resolver and builder | ||
The `FeatureBuilder` initialize a `FeatureResolver` that extracts a DAG from the `FeatureView` definitions, resolving dependencies and ensuring correct execution order. \ | ||
The FeatureView represents a logical data source, while DataSource represents the physical data source (e.g., BigQuery, Spark, etc.). \ | ||
When defines the FeatureView, the source can be a physical DataSource, a derived FeatureView, or a list of FeatureViews. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing a word here. ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, had some small suggestions that would be great to incorporate though. thanks for the docs!!!
Co-authored-by: Francisco Arceo <[email protected]>
Co-authored-by: Francisco Arceo <[email protected]>
Co-authored-by: Francisco Arceo <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
# [0.51.0](v0.50.0...v0.51.0) (2025-07-21) ### Bug Fixes * FeatureView serialization with cycle detection ([#5502](#5502)) ([f287ca5](f287ca5)) * Fix current version in publish workflow ([#5499](#5499)) ([0af6e94](0af6e94)) * Fix NPM authentication ([#5506](#5506)) ([9f85892](9f85892)) * Fix verify wheels workflow for macos14 ([#5486](#5486)) ([07174cc](07174cc)) * Fixed error thrown for invalid project name on features api ([#5525](#5525)) ([4a9a5d0](4a9a5d0)) * Fixed ODFV on-write transformations ([271ef74](271ef74)) * Move Install OS X dependencies before python setup ([#5488](#5488)) ([35f211c](35f211c)) * Normalize current version by removing 'v' prefix if present ([#5500](#5500)) ([43f3d52](43f3d52)) * Skip macOS 14 with Python 3.10 due to gettext library ([#5490](#5490)) ([41d4977](41d4977)) * Standalone Web UI Publish Workflow ([#5498](#5498)) ([c47b134](c47b134)) ### Features * Added endpoints to allow user to get data for all projects ([4e06965](4e06965)) * Added grpc and rest endpoint for features ([#5519](#5519)) ([0a75696](0a75696)) * Added relationship support to all API endpoints ([#5496](#5496)) ([bea83e7](bea83e7)) * Continue updating doc ([#5523](#5523)) ([ea53b2b](ea53b2b)) * Hybrid offline store ([#5510](#5510)) ([8f1af55](8f1af55)) * Populate created and updated timestamp on data sources ([af3056b](af3056b)) * Provide ready-to-use Python definitions in api ([37628d9](37628d9)) * Snowflake source. fetch MAX in a single query ([#5387](#5387)) ([b49cea1](b49cea1)) * Support compute engine to use multi feature views as source ([#5482](#5482)) ([b9ac90b](b9ac90b)) * Support pagination and sorting on registry apis ([#5495](#5495)) ([c4b6fbe](c4b6fbe)) * Update doc ([#5521](#5521)) ([2808ce1](2808ce1))
# [0.51.0](v0.50.0...v0.51.0) (2025-07-21) ### Bug Fixes * FeatureView serialization with cycle detection ([#5502](#5502)) ([f287ca5](f287ca5)) * Fix current version in publish workflow ([#5499](#5499)) ([0af6e94](0af6e94)) * Fix NPM authentication ([#5506](#5506)) ([9f85892](9f85892)) * Fix verify wheels workflow for macos14 ([#5486](#5486)) ([07174cc](07174cc)) * Fixed error thrown for invalid project name on features api ([#5525](#5525)) ([4a9a5d0](4a9a5d0)) * Fixed ODFV on-write transformations ([271ef74](271ef74)) * Move Install OS X dependencies before python setup ([#5488](#5488)) ([35f211c](35f211c)) * Normalize current version by removing 'v' prefix if present ([#5500](#5500)) ([43f3d52](43f3d52)) * Skip macOS 14 with Python 3.10 due to gettext library ([#5490](#5490)) ([41d4977](41d4977)) * Standalone Web UI Publish Workflow ([#5498](#5498)) ([c47b134](c47b134)) ### Features * Added endpoints to allow user to get data for all projects ([4e06965](4e06965)) * Added grpc and rest endpoint for features ([#5519](#5519)) ([0a75696](0a75696)) * Added relationship support to all API endpoints ([#5496](#5496)) ([bea83e7](bea83e7)) * Continue updating doc ([#5523](#5523)) ([ea53b2b](ea53b2b)) * Hybrid offline store ([#5510](#5510)) ([8f1af55](8f1af55)) * Populate created and updated timestamp on data sources ([af3056b](af3056b)) * Provide ready-to-use Python definitions in api ([37628d9](37628d9)) * Snowflake source. fetch MAX in a single query ([#5387](#5387)) ([b49cea1](b49cea1)) * Support compute engine to use multi feature views as source ([#5482](#5482)) ([b9ac90b](b9ac90b)) * Support pagination and sorting on registry apis ([#5495](#5495)) ([c4b6fbe](c4b6fbe)) * Update doc ([#5521](#5521)) ([2808ce1](2808ce1))
What this PR does / why we need it:
Follow up on PR: #5482.
Which issue(s) this PR fixes:
Misc