Skip to content

Conversation

HaoXuAI
Copy link
Collaborator

@HaoXuAI HaoXuAI commented Jul 10, 2025

What this PR does / why we need it:

Follow up on PR: #5482.

Which issue(s) this PR fixes:

Misc

HaoXuAI added 2 commits July 9, 2025 22:41
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
@HaoXuAI HaoXuAI requested a review from a team as a code owner July 10, 2025 05:43
@HaoXuAI HaoXuAI requested a review from Copilot July 10, 2025 05:43
Copilot

This comment was marked as outdated.

HaoXuAI added 2 commits July 9, 2025 23:46
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
@HaoXuAI HaoXuAI requested a review from Copilot July 10, 2025 06:48
@HaoXuAI HaoXuAI changed the title Compute engine feaeture view serde fix: FeatureView serialization with cycle detection Jul 10, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds support for on-demand feature sources, enforces cycle detection in FeatureView (de)serialization, standardizes the topological sort API, and refreshes related documentation.

  • Introduce OnDemandSourceType alias and refactor sources signature in OnDemandFeatureView.
  • Rename all topo_sort functions/methods to topological_sort and update callers.
  • Enhance FeatureView.to_proto/from_proto with cycle detection and update __copy__/__eq__.
  • Add DAG module README and update compute-engine reference docs.

Reviewed Changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
sdk/python/feast/on_demand_feature_view.py Add OnDemandSourceType alias and clean up sources hints
sdk/python/feast/infra/compute_engines/feature_resolver.py Rename resolver method from topo_sort to topological_sort
sdk/python/feast/infra/compute_engines/feature_builder.py Update builder calls to topological_sort
sdk/python/feast/infra/compute_engines/algorithms/topo.py Rename functions topo_sort[_multiple] to topological_sort[_multiple]
sdk/python/feast/infra/compute_engines/dag/README.md Add high-level DAG documentation
sdk/python/feast/feature_view.py Implement cycle detection in (de)serialization, update copy/eq
docs/reference/compute-engine/README.md Refresh compute-engine table and links
docs/getting-started/concepts/batch-feature-view.md New guide for BatchFeatureView
Comments suppressed due to low confidence (1)

docs/reference/compute-engine/README.md:22

  • The markdown link for ExecutionPlan is nested as [link]([link](...)). It should be formatted as a single link, e.g. [link](URL).
| `ExecutionPlan`    | Executes nodes in dependency order and stores intermediate outputs   | [link]([link](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/infra/compute_engines/dag/README.md))         |

for feature in on_demand_feature_view_proto.spec.features
],
sources=sources,
sources=cast(List[OnDemandSourceType], sources),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow this was breaking mypy, so put a fix on this.

## ✅ Key Capabilities

- **Composable DAG of FeatureViews**: Supports defining a `BatchFeatureView` on top of one or more other `FeatureView`s.
- **Transformations**: Apply PySpark-based transformation logic (`feature_transformation` or `udf`) to raw data source, can also be used to deal with multiple data sources.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this exclusively limited to PySpark?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not exclusively to Pyspark, will update it

*,
name: str,
source: Union[DataSource, FeatureView, List[FeatureView]],
sink_source: Optional[DataSource] = None,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for some reason i thought we agreed on naming it sink.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, on a second thought, I think sink_source is more explicit for the user to know it is passing a data source to this config.

Field(name="conv_rate", dtype=Float32),
],
aggregations=[
Aggregation(column="conv_rate", function="sum", time_window=timedelta(days=1)),
Copy link
Member

@franciscojavierarceo franciscojavierarceo Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe we should make it avg since summing a conversion rate is weird

## Feature resolver and builder
The `FeatureBuilder` initialize a `FeatureResolver` that extracts a DAG from the `FeatureView` definitions, resolving dependencies and ensuring correct execution order. \
The FeatureView represents a logical data source, while DataSource represents the physical data source (e.g., BigQuery, Spark, etc.). \
When defines the FeatureView, the source can be a physical DataSource, a derived FeatureView, or a list of FeatureViews.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing a word here. ?

Copy link
Member

@franciscojavierarceo franciscojavierarceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, had some small suggestions that would be great to incorporate though. thanks for the docs!!!

@HaoXuAI HaoXuAI merged commit f287ca5 into master Jul 14, 2025
18 of 19 checks passed
franciscojavierarceo pushed a commit that referenced this pull request Jul 21, 2025
# [0.51.0](v0.50.0...v0.51.0) (2025-07-21)

### Bug Fixes

* FeatureView serialization with cycle detection ([#5502](#5502)) ([f287ca5](f287ca5))
* Fix current version in publish workflow ([#5499](#5499)) ([0af6e94](0af6e94))
* Fix NPM authentication ([#5506](#5506)) ([9f85892](9f85892))
* Fix verify wheels workflow for macos14 ([#5486](#5486)) ([07174cc](07174cc))
* Fixed error thrown for invalid project name on features api ([#5525](#5525)) ([4a9a5d0](4a9a5d0))
* Fixed ODFV on-write transformations ([271ef74](271ef74))
* Move Install OS X dependencies before python setup ([#5488](#5488)) ([35f211c](35f211c))
* Normalize current version by removing 'v' prefix if present ([#5500](#5500)) ([43f3d52](43f3d52))
* Skip macOS 14 with Python 3.10 due to gettext library ([#5490](#5490)) ([41d4977](41d4977))
* Standalone Web UI Publish Workflow ([#5498](#5498)) ([c47b134](c47b134))

### Features

* Added endpoints to allow user to get data for all projects ([4e06965](4e06965))
* Added grpc and rest endpoint for features ([#5519](#5519)) ([0a75696](0a75696))
* Added relationship support to all API endpoints ([#5496](#5496)) ([bea83e7](bea83e7))
* Continue updating doc ([#5523](#5523)) ([ea53b2b](ea53b2b))
* Hybrid offline store ([#5510](#5510)) ([8f1af55](8f1af55))
* Populate created and updated timestamp on data sources ([af3056b](af3056b))
* Provide ready-to-use Python definitions in api ([37628d9](37628d9))
* Snowflake source. fetch MAX in a single query ([#5387](#5387)) ([b49cea1](b49cea1))
* Support compute engine to use multi feature views as source ([#5482](#5482)) ([b9ac90b](b9ac90b))
* Support pagination and sorting on registry apis ([#5495](#5495)) ([c4b6fbe](c4b6fbe))
* Update doc ([#5521](#5521)) ([2808ce1](2808ce1))
HaoXuAI pushed a commit that referenced this pull request Aug 11, 2025
# [0.51.0](v0.50.0...v0.51.0) (2025-07-21)

### Bug Fixes

* FeatureView serialization with cycle detection ([#5502](#5502)) ([f287ca5](f287ca5))
* Fix current version in publish workflow ([#5499](#5499)) ([0af6e94](0af6e94))
* Fix NPM authentication ([#5506](#5506)) ([9f85892](9f85892))
* Fix verify wheels workflow for macos14 ([#5486](#5486)) ([07174cc](07174cc))
* Fixed error thrown for invalid project name on features api ([#5525](#5525)) ([4a9a5d0](4a9a5d0))
* Fixed ODFV on-write transformations ([271ef74](271ef74))
* Move Install OS X dependencies before python setup ([#5488](#5488)) ([35f211c](35f211c))
* Normalize current version by removing 'v' prefix if present ([#5500](#5500)) ([43f3d52](43f3d52))
* Skip macOS 14 with Python 3.10 due to gettext library ([#5490](#5490)) ([41d4977](41d4977))
* Standalone Web UI Publish Workflow ([#5498](#5498)) ([c47b134](c47b134))

### Features

* Added endpoints to allow user to get data for all projects ([4e06965](4e06965))
* Added grpc and rest endpoint for features ([#5519](#5519)) ([0a75696](0a75696))
* Added relationship support to all API endpoints ([#5496](#5496)) ([bea83e7](bea83e7))
* Continue updating doc ([#5523](#5523)) ([ea53b2b](ea53b2b))
* Hybrid offline store ([#5510](#5510)) ([8f1af55](8f1af55))
* Populate created and updated timestamp on data sources ([af3056b](af3056b))
* Provide ready-to-use Python definitions in api ([37628d9](37628d9))
* Snowflake source. fetch MAX in a single query ([#5387](#5387)) ([b49cea1](b49cea1))
* Support compute engine to use multi feature views as source ([#5482](#5482)) ([b9ac90b](b9ac90b))
* Support pagination and sorting on registry apis ([#5495](#5495)) ([c4b6fbe](c4b6fbe))
* Update doc ([#5521](#5521)) ([2808ce1](2808ce1))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants