feat: Support compute engine to use multi feature views as source #5482

HaoXuAI · 2025-06-30T20:20:45Z

What this PR does / why we need it:

Nonbreaking changes, backward compatible.

Support multi views in source. E.g,

chained feature_view -> base feature_view -> data_source

base_fv = BatchFeatureView(
        name="hourly_driver_stats",
        entities=[driver],
        schema=[
            Field(name="conv_rate", dtype=Float32),
            Field(name="acc_rate", dtype=Float32),
            Field(name="avg_daily_trips", dtype=Int64),
            Field(name="driver_id", dtype=Int32),
        ],
        online=True,
        offline=True,
        source=source,
    )

chained_fv = BatchFeatureView(
        name="daily_driver_stats",
        entities=[driver],
        udf=transform_feature,
        udf_string="transform",
        schema=[
            Field(name="conv_rate", dtype=Float32),
            Field(name="driver_id", dtype=Int32),
        ],
        online=True,
        offline=True,
        source=base_fv,
        sink_source=SparkSource(
            name="daily_driver_stats_sink",
            path="/tmp/daily_driver_stats_sink",
            file_format="parquet",
            timestamp_field="event_timestamp",
            created_timestamp_column="created",
        ),
    )

feature_view -> [feature_view1, feature_view2]

multi_view = BatchFeatureView(
        name="multi_view",
        entities=[driver],
        schema=[
            Field(name="driver_id", dtype=Int32),
            Field(name="daily_driver_stats__conv_rate", dtype=Float32),
            Field(name="daily_driver_stats__acc_rate", dtype=Float32),
        ],
        online=True,
        offline=True,
        source=[base_fv, chained_fv],
        sink_source=SparkSource(
            name="multi_view_sink",
            path="/tmp/multi_view_sink",
            file_format="parquet",
            timestamp_field="daily_driver_stats__event_timestamp",
            created_timestamp_column="daily_driver_stats__created",
        ),
    )

Diagram:

APIs:

sink_source: The sink_source is used when you have multiple views as the source. This is required for materialization to understand what output you need to write to the offline store.
Underline default join operation: data source can be merged via either the transformation udf you specified yourself, or the default join operation. The default join operation is an inner join on each FeatureView's features, and left join on entity df.
Works for both BatchFeatureView and StreamFeatureView.

This unlocks the request to join multiple data sources, such as SparkSource + SnowflakeSource.
You can do with this setups:

Create one Snowflake FeatureView for SnowflakeSource
Create one Spark FeatureView for SparkSource
Create one FeatureView to use those FeatureView as the source.

Which issue(s) this PR fixes:

#5444 (comment)

Misc

TODO:

Not every feature view node needs to be recomputed. Features computed and loaded into the offline store can be skipped for materialization. This is also known as the stateful store.
Default join operation should be customizable.

Signed-off-by: HaoXuAI <[email protected]>

franciscojavierarceo · 2025-07-09T00:01:04Z

sdk/python/feast/batch_feature_view.py

    entities: List[str]
    ttl: Optional[timedelta]
    source: DataSource
+    sink_source: Optional[DataSource] = None


Maybe we should just call it sink?

Sounds good, will update it

franciscojavierarceo

Some small nits but this mostly lgtm

Can you add a page on the docs before merging this PR? Would be great to share with community on how to use it.

franciscojavierarceo · 2025-07-09T00:02:46Z

sdk/python/feast/infra/compute_engines/algorithms/topo.py

+from feast.infra.compute_engines.dag.node import DAGNode
+
+
+def topo_sort(root: DAGNode) -> List[DAGNode]:


Why not call it topological_sort?

ntkathole · 2025-07-09T11:22:08Z

sdk/python/feast/feature_view.py

            else None
        )
+        source_views = [
+            FeatureView.from_proto(FeatureViewProto(spec=view_spec, meta=None))


from_proto() method recursively calls itself for each source view without any depth limit. While cycle detection exists in FeatureResolver, cycle detection only runs when you use the compute engine, but proto deserialization happens much earlier during APIs/ registry loading.

We might need to handle this in FeatureView.from_proto().

Also, do we not need to store metadata for nested feature views ? meta=None ?

We need both cycle detection and de-duplication during serialization. It may not cause issue for few feature views but if there are many feature views, it could cause slowness.

A -> [B, C]
B -> [D, E]
C -> [D, E]

When serializing FeatureViewA = FeatureViewD and FeatureViewE get serialized twice.

right make sense.
I don't have any meta data required for compute engine at the moment, what do you think something useful?

ntkathole · 2025-07-09T11:36:28Z

sdk/python/feast/feature_view.py

    ttl: Optional[timedelta]
    batch_source: DataSource
    stream_source: Optional[DataSource]
+    source_views: Optional[List["FeatureView"]]


In __eq__ , I think we also need to compare compare source_views, else two FeatureViews with different source dependencies will be considered equal.

same for __copy__

HaoXuAI · 2025-07-10T05:23:16Z

Merged it and add new PR with following suggestions

# [0.51.0](v0.50.0...v0.51.0) (2025-07-21) ### Bug Fixes * FeatureView serialization with cycle detection ([#5502](#5502)) ([f287ca5](f287ca5)) * Fix current version in publish workflow ([#5499](#5499)) ([0af6e94](0af6e94)) * Fix NPM authentication ([#5506](#5506)) ([9f85892](9f85892)) * Fix verify wheels workflow for macos14 ([#5486](#5486)) ([07174cc](07174cc)) * Fixed error thrown for invalid project name on features api ([#5525](#5525)) ([4a9a5d0](4a9a5d0)) * Fixed ODFV on-write transformations ([271ef74](271ef74)) * Move Install OS X dependencies before python setup ([#5488](#5488)) ([35f211c](35f211c)) * Normalize current version by removing 'v' prefix if present ([#5500](#5500)) ([43f3d52](43f3d52)) * Skip macOS 14 with Python 3.10 due to gettext library ([#5490](#5490)) ([41d4977](41d4977)) * Standalone Web UI Publish Workflow ([#5498](#5498)) ([c47b134](c47b134)) ### Features * Added endpoints to allow user to get data for all projects ([4e06965](4e06965)) * Added grpc and rest endpoint for features ([#5519](#5519)) ([0a75696](0a75696)) * Added relationship support to all API endpoints ([#5496](#5496)) ([bea83e7](bea83e7)) * Continue updating doc ([#5523](#5523)) ([ea53b2b](ea53b2b)) * Hybrid offline store ([#5510](#5510)) ([8f1af55](8f1af55)) * Populate created and updated timestamp on data sources ([af3056b](af3056b)) * Provide ready-to-use Python definitions in api ([37628d9](37628d9)) * Snowflake source. fetch MAX in a single query ([#5387](#5387)) ([b49cea1](b49cea1)) * Support compute engine to use multi feature views as source ([#5482](#5482)) ([b9ac90b](b9ac90b)) * Support pagination and sorting on registry apis ([#5495](#5495)) ([c4b6fbe](c4b6fbe)) * Update doc ([#5521](#5521)) ([2808ce1](2808ce1))

HaoXuAI added 2 commits June 30, 2025 13:17

Draft: multi source support

87429ca

Signed-off-by: HaoXuAI <[email protected]>

Draft: multi source support

98b6179

Signed-off-by: HaoXuAI <[email protected]>

HaoXuAI requested a review from a team as a code owner June 30, 2025 20:20

HaoXuAI added 2 commits July 7, 2025 00:12

Checkpoint

89bac63

Signed-off-by: HaoXuAI <[email protected]>

Checkpoint

8ef4f4f

Signed-off-by: HaoXuAI <[email protected]>

HaoXuAI added the ok-to-test label Jul 8, 2025

HaoXuAI added 2 commits July 7, 2025 21:53

Checkpoint

74a5dee

Signed-off-by: HaoXuAI <[email protected]>

rebase master

0d64399

Signed-off-by: HaoXuAI <[email protected]>

HaoXuAI changed the title ~~Draft: Compute engine multi sources and feature view as source~~ Feat: compute engine multi sources and feature view as source Jul 8, 2025

HaoXuAI changed the title ~~Feat: compute engine multi sources and feature view as source~~ feat: Compute engine multi sources and feature view as source Jul 8, 2025

HaoXuAI added 5 commits July 7, 2025 23:39

fix testing

53bb6fb

Signed-off-by: HaoXuAI <[email protected]>

fix testing

ea50f43

Signed-off-by: HaoXuAI <[email protected]>

fix testing

5e74aa8

Signed-off-by: HaoXuAI <[email protected]>

fix testing

07f5e0b

Signed-off-by: HaoXuAI <[email protected]>

fix testing

d2eb23f

Signed-off-by: HaoXuAI <[email protected]>

HaoXuAI requested review from franciscojavierarceo and ntkathole July 8, 2025 15:52

HaoXuAI changed the title ~~feat: Compute engine multi sources and feature view as source~~ feat: Support compute engine use multi feature views as source Jul 8, 2025

HaoXuAI changed the title ~~feat: Support compute engine use multi feature views as source~~ feat: Support compute engine to use multi feature views as source Jul 8, 2025

HaoXuAI added 2 commits July 8, 2025 09:06

fix testing

9122f4b

Signed-off-by: HaoXuAI <[email protected]>

fix testing

15f8bde

Signed-off-by: HaoXuAI <[email protected]>

franciscojavierarceo reviewed Jul 9, 2025

View reviewed changes

franciscojavierarceo approved these changes Jul 9, 2025

View reviewed changes

ntkathole reviewed Jul 9, 2025

View reviewed changes

HaoXuAI merged commit b9ac90b into master Jul 10, 2025
17 checks passed

HaoXuAI mentioned this pull request Jul 10, 2025

fix: FeatureView serialization with cycle detection #5502

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Support compute engine to use multi feature views as source #5482

feat: Support compute engine to use multi feature views as source #5482

Uh oh!

HaoXuAI commented Jun 30, 2025 •

edited

Loading

Uh oh!

franciscojavierarceo Jul 9, 2025

Uh oh!

HaoXuAI Jul 9, 2025

Uh oh!

franciscojavierarceo left a comment

Uh oh!

franciscojavierarceo Jul 9, 2025

Uh oh!

ntkathole Jul 9, 2025

Uh oh!

ntkathole Jul 9, 2025

Uh oh!

ntkathole Jul 9, 2025

Uh oh!

HaoXuAI Jul 9, 2025

Uh oh!

ntkathole Jul 9, 2025

Uh oh!

HaoXuAI Jul 9, 2025

Uh oh!

Uh oh!

HaoXuAI commented Jul 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		from feast.infra.compute_engines.dag.node import DAGNode


		def topo_sort(root: DAGNode) -> List[DAGNode]:

feat: Support compute engine to use multi feature views as source #5482

feat: Support compute engine to use multi feature views as source #5482

Uh oh!

Conversation

HaoXuAI commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it:

Which issue(s) this PR fixes:

Misc

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

franciscojavierarceo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HaoXuAI commented Jul 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

HaoXuAI commented Jun 30, 2025 •

edited

Loading