-
Notifications
You must be signed in to change notification settings - Fork 845
fix: Use generation id to bootstrap version id #6029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
sakoush
merged 11 commits into
SeldonIO:v2
from
sakoush:INFRA-1227/use_generation_id_versions
Nov 6, 2024
Merged
fix: Use generation id to bootstrap version id #6029
sakoush
merged 11 commits into
SeldonIO:v2
from
sakoush:INFRA-1227/use_generation_id_versions
Nov 6, 2024
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
lc525
reviewed
Nov 6, 2024
lc525
approved these changes
Nov 6, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. I believe this new approach, being significantly simpler compared to the previous one, really helps us wrt testing. Thank you!
Co-authored-by: Lucian Carata <[email protected]>
jtayl222
pushed a commit
to jtayl222/seldon-core
that referenced
this pull request
Jul 20, 2025
* use generation id to bootstrap versions * add event type for sync operation * move event to response * introduce a second stage of sync process after the scheduler is ready * send the right resources based on the event type for the control plane * update control plane test * add test coverage * update control plane test * allow model progressing status update * add test for generation id * Update scheduler/pkg/store/memory.go Co-authored-by: Lucian Carata <[email protected]> --------- Co-authored-by: Lucian Carata <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it:
This PR adds the ability to use k8s generation ids to bootstrap version ids for models. This is required so that in the case of scheduler restarts (where the state of the models is lost) the versions carry on from the max versions that could be set. In this way during network partitions we guarantee that the new version is going to be bigger than any potentially existing version from the agents and therefore simplifies the reconciliation process.
We also intentionally waits for the synchronisation process to finish before the controller sending any resources state that it has in the cache and therefore we allow agents to connect first and update the scheduler cache with any existing loaded models before proceeding. This is done by doing a 2 step process on the control plane stream between the scheduler and the the controller. The first step is to signal the number of potential servers (agents) to connect then once they connect and advertise their state we proceed with the second stage. We introduce an event type on this stream to differentiate between the 2 operations.
Another change is to allow status updates for models that are stuck in ModelProgressing state, this is done in
updateLoadedModelsImpl
Which issue(s) this PR fixes:
Fixes INFRA-1227 (internal)
Special notes for your reviewer: