-
Notifications
You must be signed in to change notification settings - Fork 33
Add local trigger for ParentReady. #404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add local trigger for ParentReady. #404
Conversation
|
a PR description would be nice :) |
Done. |
|
|
||
| /// Under normal cases we should have a parent ready for first slot of every window. | ||
| /// But it could be we joined when the later slots of the window are finalized, then | ||
| /// we never saw the parent ready for the first slot and haven't voted for first slot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is the problem that we haven't voted for the first slot resolved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once ParentReady is triggered for the first slot, we will vote Notarize for all previous blocks in the window.
| /// But it could be we joined when the later slots of the window are finalized, then | ||
| /// we never saw the parent ready for the first slot and haven't voted for first slot | ||
| /// so we can't keep processing rest of the window. This is especially a problem for | ||
| /// cluster standstill. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you lay out the exact details/example of when this is a problem, because not every cluster standstill needs this fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
votor/src/event_handler.rs
Outdated
| return None; | ||
| } | ||
| if vctx.vote_history.highest_parent_ready_slot() >= Some(first_slot_of_window) | ||
| || !local_context.finalized_blocks.contains(&block) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you root a later block and clear the finalized block from the the finalized_blocks set, is it guaranteed add_missing_parent_ready will also run on some later block in the window?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently have a check that we only root a block if we have voted for the block:
alpenglow/votor/src/event_handler.rs
Line 620 in 2b71a11
| && vctx.vote_history.voted(slot) |
So, if we have rooted a later block, that means we don't need this ParentReady any more, we can keep voting for other blocks in the window.
We only need to add missing parent ready if we haven't rooted any block in the window.
votor/src/event_handler.rs
Outdated
| // If the block is missing, we can't trigger parent ready | ||
| let bank_forks_r = ctx.bank_forks.read().unwrap(); | ||
| let bank = bank_forks_r.get(slot)?; | ||
| let first_slot_of_window_bank = bank_forks_r.get(first_slot_of_window)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if a later root in the window has been set already and this first_slot_of_window has already been purged from bank forks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If somehow any block in the window is set to root, we will get a None here, and add_missing_parent_ready will return a None, in which case we will do nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't you get stuck again in this scenario if we do nothing, i.e. C can't vote?
alpenglow/votor/src/event_handler.rs
Lines 447 to 449 in 51f8095
| /// A and B finalize block together up to slot 9, now A exited and C joined. | |
| /// C sees block 9 as finalized, but it never had parent ready triggered for slot 8. | |
| /// C can't vote for any slot in the window because there is no parent ready for slot 8. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two cases where middle of the window can be set to root:
- A ParentReady somehow triggered for first slot of the window, in which case C will be unstuck
- Somehow we restarted on a slot in the middle of the window, in which case we should get a ParentReady on the slot we restart from.
That said, in case 2 we trigger a ParentReady on the slot we restart from, I'm now thinking I should do the same in this PR, just trigger a ParentReady on block 9 instead of block 8. Let me try that and see if local-cluster tests still pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in this scenario here:
alpenglow/votor/src/event_handler.rs
Lines 447 to 449 in 51f8095
| /// A and B finalize block together up to slot 9, now A exited and C joined. | |
| /// C sees block 9 as finalized, but it never had parent ready triggered for slot 8. | |
| /// C can't vote for any slot in the window because there is no parent ready for slot 8. |
How is the bug triggered? what is the case in a node could see root 9 without triggering ParentReady on 8, I thought we always send a ParentReady on restart: #404 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The block 9 wasn't the root for C. C has root somewhere around block 0 I think, because it didn't join until A exited. It saw all the finalization certs A and B created, but never had a chance to vote on any block. Because C doesn't have ParentReady for block 8, it will never vote for block 8 and block 9, then the check "we voted for this slot" isn't true and we can't root block 8 and 9.
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah because we only trigger ParentReady on restart for the current root bank:
alpenglow/votor/src/certificate_pool_service.rs
Lines 207 to 213 in fb7a697
| let root_bank = ctx.root_bank_cache.root_bank(); | |
| let root_block = (root_bank.slot(), root_bank.block_id().unwrap_or_default()); | |
| let mut highest_parent_ready = root_bank.slot(); | |
| events.push(VotorEvent::ParentReady { | |
| slot: root_bank.slot().checked_add(1).unwrap(), | |
| parent_block: root_block, | |
| }); |
Can we solve this by triggering ParentReady for all the leaves in bank forks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can, but that seems to be going the long way, and might not always work.
Think about what happened to a naive validator which joined with root at block 0.
Now the cluster is at block 10001, it needs to do ParentReady and repair all the blocks between 0 and 10001, which may or may not be available on its neighbors.
Instead, I believe the researchers designed the fast way:
- I joined the cluster and find out everyone else rooted block 10001
- I repaired and played block 10001, triggered a local ParentReady so I can Notarize block 10001, mark 10001 my new root, and trigger votes for 10002, 10003
- Now I can join the cluster and vote happily with everyone else
This is especially useful during cluster standstill, sync in the fastest way possible and move on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah i guess it doesn't solve the problem if the new root is made off a non leaf node in bank forks, makes sense
AshwinSekar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Echo-ing carl's comments I believe it's important to check the failure cases on state being cleaned up too early.
I think we need to ensure that we don't root bank forks until we're sure that we don't need to mark a local parent ready, probably easiest to enforce that we've emitted a ParentReady for the window, or simply change the order of the two checks.
The other place I was looking at was should_ignore, but I think that should also be fine if we change the order.
| // We haven't finished replay for the block, so we can't trigger parent ready | ||
| return None; | ||
| } | ||
| if bank.block_id() != Some(block_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pointing this out as a sanity check, I believe it is true that a block within the leader window being finalized implies that the first slot in the window is also finalized (due to the consecutive check in tryNotar).
Thus it's impossible for this block_id to match, but first_slot_of_window_bank to not be an ancestor of bank.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think the only case block_id match here but we fail to get first_slot_of_window_bank later is we somehow rooted in the middle. Which means we somehow started voting in this window so we don't need this extra ParentReady.
Hmm, maybe the only exception is when we do a cluster restart from a slot in the middle of the window. In that case do we set up ParentReady and timeouts correctly now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes cluster restart should be fine, we kick off a local parent ready when the node restarts.
my concern was could the first_slot_of_window_bank in our bank_forks be for a different version than the ancestor of block_id. I don't think this is possible though, just wanted to point it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so
| events.push(VotorEvent::ParentReady { |
says we can trigger ParentReady from any slot (not just first slot of the window)
Should we trigger a ParentReady from finalized slot as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And, I don't think set_timeouts correctly handles setting timeouts from middle of a window?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AshwinSekar I changed ParentReady to trigger on the finalized block instead of the first slot of the window, to be consistent with restarting on a slot in the middle, what do you think? Tests look happy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I missed this, I believe this should be fine. The key is that we'll be able to vote skip and notarize-fallback if needed.
Sorry I'm confused, which order should I change here? |
I was mistaken I think this is fine since as you pointed out we only cleanup after we have voted on the slot. Since the first leader slot requires a parent ready to vote, and the slots in the window require us to vote on the previous slot, i don't think it's possible that we need to get into a local trigger scenario but have rooted something in the window. |
votor/src/event_handler.rs
Outdated
| /// While B is stuck because it is waiting for >60% of the votes to finalize slot 9. | ||
| /// The cluster will get stuck. | ||
| /// After we add the following function, C will see that block 9 is finalized yet | ||
| /// it never had parent ready for slot 8, so it will find the block id of 8 from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if C started from a snapshot of 9 and has thus rooted 9, and doesn't have 8 in its blockstore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See Ashwin's explanation above: "yes cluster restart should be fine, we kick off a local parent ready when the node restarts."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| events.push(VotorEvent::ParentReady { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah makes sense thanks
| // We haven't finished replay for the block, so we can't trigger parent ready | ||
| return None; | ||
| } | ||
| if bank.block_id() != Some(block_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I missed this, I believe this should be fine. The key is that we'll be able to vote skip and notarize-fallback if needed.
votor/src/event_handler.rs
Outdated
| return None; | ||
| }; | ||
| info!( | ||
| "{}: Triggering parent ready for missing first slot of window {slot} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit no longer first slot of window, maybe better to just say local trigger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
votor/src/event_handler.rs
Outdated
| } | ||
| // If the block is missing, we can't trigger parent ready | ||
| let bank_forks_r = ctx.bank_forks.read().unwrap(); | ||
| let bank = bank_forks_r.get(slot)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: get already clones the bank so you can just do
let bank = ctx.bank_forks.read().unwrap().get(slot)?;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
carllin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for thinking through the nuances
| // No need to trigger parent ready for the first slot of the window | ||
| return None; | ||
| } | ||
| if vctx.vote_history.highest_parent_ready_slot() >= Some(first_slot_of_window) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is safe because if a slot in this window got finalized, there cannot exist a skip certificate for this window, so there's no other fork that can have a higher parent ready right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could also be that the cluster finalized this slot and the slots after them, but you are replaying blocks super slowly so you just got to this slot.
In any case, I think if we have ParentReady for a future slot in a future window, we shouldn't care about the current slot window any more, best strategy is fast forward to that future window.
votor/src/event_handler.rs
Outdated
| /// cluster. If we get a finalization cert for later slots of the window and we have the | ||
| /// block replayed, trace back to the first slot of the window and emit parent ready. | ||
| fn add_missing_parent_ready( | ||
| block: Block, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename to finalized_block to emphasize this must be finalzied
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
votor/src/event_handler.rs
Outdated
| received_shred, | ||
| stats, | ||
| )?; | ||
| if let Some((slot, block)) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: (ready_slot, parent_block)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
AshwinSekar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
A lot of code (try_notar, set_timeouts) are only triggered when the first slot of the window gets ParentReady event.
However, a validator can join the cluster at any slot. If a validator joins in the middle of a window, currently ParentReady doesn't get triggered for the first slot, then the timers for later slots in the window aren't set, and it can't vote Notarize on the slots in the window either.
This is especially a problem during cluster standstill, if enough validators are stuck in the state, then the cluster may never recover from a standstill.
When a validator sees that a later slot in the window is finalized, it has that block and all its ancestors, mark the first slot of the window ParentReady, so that we can start voting on rest of the slots in this window.