-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: Support compute engine to use multi feature views as source #5482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
entities: List[str] | ||
ttl: Optional[timedelta] | ||
source: DataSource | ||
sink_source: Optional[DataSource] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should just call it sink?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, will update it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small nits but this mostly lgtm
Can you add a page on the docs before merging this PR? Would be great to share with community on how to use it.
from feast.infra.compute_engines.dag.node import DAGNode | ||
|
||
|
||
def topo_sort(root: DAGNode) -> List[DAGNode]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not call it topological_sort?
else None | ||
) | ||
source_views = [ | ||
FeatureView.from_proto(FeatureViewProto(spec=view_spec, meta=None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from_proto()
method recursively calls itself for each source view without any depth limit. While cycle detection exists in FeatureResolver
, cycle detection only runs when you use the compute engine, but proto deserialization happens much earlier during APIs/ registry loading.
We might need to handle this in FeatureView.from_proto().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do we not need to store metadata for nested feature views ? meta=None ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need both cycle detection and de-duplication during serialization. It may not cause issue for few feature views but if there are many feature views, it could cause slowness.
A -> [B, C]
B -> [D, E]
C -> [D, E]
When serializing FeatureViewA = FeatureViewD and FeatureViewE get serialized twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right make sense.
I don't have any meta data required for compute engine at the moment, what do you think something useful?
ttl: Optional[timedelta] | ||
batch_source: DataSource | ||
stream_source: Optional[DataSource] | ||
source_views: Optional[List["FeatureView"]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In __eq__
, I think we also need to compare compare source_views, else two FeatureViews with different source dependencies will be considered equal.
same for __copy__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
Merged it and add new PR with following suggestions |
# [0.51.0](v0.50.0...v0.51.0) (2025-07-21) ### Bug Fixes * FeatureView serialization with cycle detection ([#5502](#5502)) ([f287ca5](f287ca5)) * Fix current version in publish workflow ([#5499](#5499)) ([0af6e94](0af6e94)) * Fix NPM authentication ([#5506](#5506)) ([9f85892](9f85892)) * Fix verify wheels workflow for macos14 ([#5486](#5486)) ([07174cc](07174cc)) * Fixed error thrown for invalid project name on features api ([#5525](#5525)) ([4a9a5d0](4a9a5d0)) * Fixed ODFV on-write transformations ([271ef74](271ef74)) * Move Install OS X dependencies before python setup ([#5488](#5488)) ([35f211c](35f211c)) * Normalize current version by removing 'v' prefix if present ([#5500](#5500)) ([43f3d52](43f3d52)) * Skip macOS 14 with Python 3.10 due to gettext library ([#5490](#5490)) ([41d4977](41d4977)) * Standalone Web UI Publish Workflow ([#5498](#5498)) ([c47b134](c47b134)) ### Features * Added endpoints to allow user to get data for all projects ([4e06965](4e06965)) * Added grpc and rest endpoint for features ([#5519](#5519)) ([0a75696](0a75696)) * Added relationship support to all API endpoints ([#5496](#5496)) ([bea83e7](bea83e7)) * Continue updating doc ([#5523](#5523)) ([ea53b2b](ea53b2b)) * Hybrid offline store ([#5510](#5510)) ([8f1af55](8f1af55)) * Populate created and updated timestamp on data sources ([af3056b](af3056b)) * Provide ready-to-use Python definitions in api ([37628d9](37628d9)) * Snowflake source. fetch MAX in a single query ([#5387](#5387)) ([b49cea1](b49cea1)) * Support compute engine to use multi feature views as source ([#5482](#5482)) ([b9ac90b](b9ac90b)) * Support pagination and sorting on registry apis ([#5495](#5495)) ([c4b6fbe](c4b6fbe)) * Update doc ([#5521](#5521)) ([2808ce1](2808ce1))
# [0.51.0](v0.50.0...v0.51.0) (2025-07-21) ### Bug Fixes * FeatureView serialization with cycle detection ([#5502](#5502)) ([f287ca5](f287ca5)) * Fix current version in publish workflow ([#5499](#5499)) ([0af6e94](0af6e94)) * Fix NPM authentication ([#5506](#5506)) ([9f85892](9f85892)) * Fix verify wheels workflow for macos14 ([#5486](#5486)) ([07174cc](07174cc)) * Fixed error thrown for invalid project name on features api ([#5525](#5525)) ([4a9a5d0](4a9a5d0)) * Fixed ODFV on-write transformations ([271ef74](271ef74)) * Move Install OS X dependencies before python setup ([#5488](#5488)) ([35f211c](35f211c)) * Normalize current version by removing 'v' prefix if present ([#5500](#5500)) ([43f3d52](43f3d52)) * Skip macOS 14 with Python 3.10 due to gettext library ([#5490](#5490)) ([41d4977](41d4977)) * Standalone Web UI Publish Workflow ([#5498](#5498)) ([c47b134](c47b134)) ### Features * Added endpoints to allow user to get data for all projects ([4e06965](4e06965)) * Added grpc and rest endpoint for features ([#5519](#5519)) ([0a75696](0a75696)) * Added relationship support to all API endpoints ([#5496](#5496)) ([bea83e7](bea83e7)) * Continue updating doc ([#5523](#5523)) ([ea53b2b](ea53b2b)) * Hybrid offline store ([#5510](#5510)) ([8f1af55](8f1af55)) * Populate created and updated timestamp on data sources ([af3056b](af3056b)) * Provide ready-to-use Python definitions in api ([37628d9](37628d9)) * Snowflake source. fetch MAX in a single query ([#5387](#5387)) ([b49cea1](b49cea1)) * Support compute engine to use multi feature views as source ([#5482](#5482)) ([b9ac90b](b9ac90b)) * Support pagination and sorting on registry apis ([#5495](#5495)) ([c4b6fbe](c4b6fbe)) * Update doc ([#5521](#5521)) ([2808ce1](2808ce1))
What this PR does / why we need it:
Nonbreaking changes, backward compatible.
Support multi views in source. E.g,
Diagram:

APIs:
transformation
udf you specified yourself, or the default join operation. The default join operation is aninner
join on each FeatureView's features, and left join on entity df.This unlocks the request to join multiple data sources, such as SparkSource + SnowflakeSource.
You can do with this setups:
Which issue(s) this PR fixes:
#5444 (comment)
Misc
TODO:
stateful
store.