-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: Add possibility to materialize only latest values, to increase performance #5713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add possibility to materialize only latest values, to increase performance #5713
Conversation
…terialization logic (calling it) Signed-off-by: lukas.valatka <[email protected]>
Signed-off-by: lukas.valatka <[email protected]>
Signed-off-by: lukas.valatka <[email protected]>
…e' of github.com:astronautas/feast into feat/add-selective-deduplicate-pushdown-to-offline-store
|
Let's re-run tests? Random issue, but no changes to dependency management :/ |
|
seems like aws creds have expired @franciscojavierarceo |
|
@ntkathole @jeremyary can you investigate? |
HaoXuAI
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be better to add the config to the fs.materialize API? So that you can customize the materialize process that materialize the FeatureView if you need pushdown filter, and some other process you don't need.
Why not indeed. I'll check it out and tag you back. |
8d77b72
into
feast-dev:master
…performance (#5713) * add pull_all_from_table_or_query for clickhouse, to align with new materialization logic (calling it) Signed-off-by: lukas.valatka <[email protected]> * add option to select to materialize only latest values, for performance Signed-off-by: lukas.valatka <[email protected]> * enforce non optional params Signed-off-by: lukas.valatka <[email protected]> --------- Signed-off-by: lukas.valatka <[email protected]> Co-authored-by: Lukas Valatka <[email protected]>
# [0.57.0](v0.56.0...v0.57.0) (2025-11-13) ### Bug Fixes * Improve trino to feast type mapping with (real,varchar,timestamp,decimal) ([#5691](#5691)) ([f855ad2](f855ad2)) * Materialize API - ODFV views not looked-up (thinks views non existant) - crashes materialize ([#5716](#5716)) ([1b050b3](1b050b3)) * Support historical feature retrieval with start_date/end_date in RemoteOfflineStore ([#5703](#5703)) ([ad32756](ad32756)) * Thread safe Clickhouse offline store ([#5710](#5710)) ([5f446ed](5f446ed)) ### Features * Add annotations to cronjob CRDs ([#5701](#5701)) ([be6e6c2](be6e6c2)) * Add batch commit mode for MySQL OnlineStore ([#5699](#5699)) ([3cfe4eb](3cfe4eb)) * Add possibility to materialize only latest values, to increase performance ([#5713](#5713)) ([8d77b72](8d77b72)) * Support table format: Iceberg, Delta, and Hudi ([#5650](#5650)) ([2915ad1](2915ad1))

What this PR does / why we need it:
Adds an option to materialize only the latest values (essentially pushes down deduplication to offline store), to reduce client memory consumption and reduce e2e duration. Especially noticeable for large-scale materialization - think hundreds of thousands of rows with ~150 feature views, with latency-critical materializations - as we observed in our ML project at cast.ai.
Which issue(s) this PR fixes:
#5707 (comment)
Misc
This will be configured via feature store (repo) config file: