Skip to content

Conversation

@wen-coding
Copy link
Contributor

@wen-coding wen-coding commented Aug 19, 2025

A lot of code (try_notar, set_timeouts) are only triggered when the first slot of the window gets ParentReady event.

However, a validator can join the cluster at any slot. If a validator joins in the middle of a window, currently ParentReady doesn't get triggered for the first slot, then the timers for later slots in the window aren't set, and it can't vote Notarize on the slots in the window either.

This is especially a problem during cluster standstill, if enough validators are stuck in the state, then the cluster may never recover from a standstill.

When a validator sees that a later slot in the window is finalized, it has that block and all its ancestors, mark the first slot of the window ParentReady, so that we can start voting on rest of the slots in this window.

@wen-coding wen-coding marked this pull request as draft August 19, 2025 00:30
@wen-coding wen-coding linked an issue Aug 19, 2025 that may be closed by this pull request
@wen-coding wen-coding marked this pull request as ready for review August 19, 2025 00:59
@carllin
Copy link
Contributor

carllin commented Aug 19, 2025

a PR description would be nice :)

@wen-coding
Copy link
Contributor Author

a PR description would be nice :)

Done.


/// Under normal cases we should have a parent ready for first slot of every window.
/// But it could be we joined when the later slots of the window are finalized, then
/// we never saw the parent ready for the first slot and haven't voted for first slot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is the problem that we haven't voted for the first slot resolved?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once ParentReady is triggered for the first slot, we will vote Notarize for all previous blocks in the window.

/// But it could be we joined when the later slots of the window are finalized, then
/// we never saw the parent ready for the first slot and haven't voted for first slot
/// so we can't keep processing rest of the window. This is especially a problem for
/// cluster standstill.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you lay out the exact details/example of when this is a problem, because not every cluster standstill needs this fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

return None;
}
if vctx.vote_history.highest_parent_ready_slot() >= Some(first_slot_of_window)
|| !local_context.finalized_blocks.contains(&block)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you root a later block and clear the finalized block from the the finalized_blocks set, is it guaranteed add_missing_parent_ready will also run on some later block in the window?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently have a check that we only root a block if we have voted for the block:

&& vctx.vote_history.voted(slot)

So, if we have rooted a later block, that means we don't need this ParentReady any more, we can keep voting for other blocks in the window.
We only need to add missing parent ready if we haven't rooted any block in the window.

// If the block is missing, we can't trigger parent ready
let bank_forks_r = ctx.bank_forks.read().unwrap();
let bank = bank_forks_r.get(slot)?;
let first_slot_of_window_bank = bank_forks_r.get(first_slot_of_window)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if a later root in the window has been set already and this first_slot_of_window has already been purged from bank forks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If somehow any block in the window is set to root, we will get a None here, and add_missing_parent_ready will return a None, in which case we will do nothing.

Copy link
Contributor

@carllin carllin Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't you get stuck again in this scenario if we do nothing, i.e. C can't vote?

/// A and B finalize block together up to slot 9, now A exited and C joined.
/// C sees block 9 as finalized, but it never had parent ready triggered for slot 8.
/// C can't vote for any slot in the window because there is no parent ready for slot 8.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two cases where middle of the window can be set to root:

  1. A ParentReady somehow triggered for first slot of the window, in which case C will be unstuck
  2. Somehow we restarted on a slot in the middle of the window, in which case we should get a ParentReady on the slot we restart from.

That said, in case 2 we trigger a ParentReady on the slot we restart from, I'm now thinking I should do the same in this PR, just trigger a ParentReady on block 9 instead of block 8. Let me try that and see if local-cluster tests still pass.

Copy link
Contributor

@carllin carllin Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in this scenario here:

/// A and B finalize block together up to slot 9, now A exited and C joined.
/// C sees block 9 as finalized, but it never had parent ready triggered for slot 8.
/// C can't vote for any slot in the window because there is no parent ready for slot 8.

How is the bug triggered? what is the case in a node could see root 9 without triggering ParentReady on 8, I thought we always send a ParentReady on restart: #404 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The block 9 wasn't the root for C. C has root somewhere around block 0 I think, because it didn't join until A exited. It saw all the finalization certs A and B created, but never had a chance to vote on any block. Because C doesn't have ParentReady for block 8, it will never vote for block 8 and block 9, then the check "we voted for this slot" isn't true and we can't root block 8 and 9.

Does that make sense?

Copy link
Contributor

@carllin carllin Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah because we only trigger ParentReady on restart for the current root bank:

let root_bank = ctx.root_bank_cache.root_bank();
let root_block = (root_bank.slot(), root_bank.block_id().unwrap_or_default());
let mut highest_parent_ready = root_bank.slot();
events.push(VotorEvent::ParentReady {
slot: root_bank.slot().checked_add(1).unwrap(),
parent_block: root_block,
});

Can we solve this by triggering ParentReady for all the leaves in bank forks

Copy link
Contributor Author

@wen-coding wen-coding Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, but that seems to be going the long way, and might not always work.
Think about what happened to a naive validator which joined with root at block 0.
Now the cluster is at block 10001, it needs to do ParentReady and repair all the blocks between 0 and 10001, which may or may not be available on its neighbors.

Instead, I believe the researchers designed the fast way:

  • I joined the cluster and find out everyone else rooted block 10001
  • I repaired and played block 10001, triggered a local ParentReady so I can Notarize block 10001, mark 10001 my new root, and trigger votes for 10002, 10003
  • Now I can join the cluster and vote happily with everyone else

This is especially useful during cluster standstill, sync in the fastest way possible and move on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah i guess it doesn't solve the problem if the new root is made off a non leaf node in bank forks, makes sense

Copy link
Contributor

@AshwinSekar AshwinSekar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Echo-ing carl's comments I believe it's important to check the failure cases on state being cleaned up too early.

I think we need to ensure that we don't root bank forks until we're sure that we don't need to mark a local parent ready, probably easiest to enforce that we've emitted a ParentReady for the window, or simply change the order of the two checks.

The other place I was looking at was should_ignore, but I think that should also be fine if we change the order.

// We haven't finished replay for the block, so we can't trigger parent ready
return None;
}
if bank.block_id() != Some(block_id) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pointing this out as a sanity check, I believe it is true that a block within the leader window being finalized implies that the first slot in the window is also finalized (due to the consecutive check in tryNotar).
Thus it's impossible for this block_id to match, but first_slot_of_window_bank to not be an ancestor of bank.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think the only case block_id match here but we fail to get first_slot_of_window_bank later is we somehow rooted in the middle. Which means we somehow started voting in this window so we don't need this extra ParentReady.

Hmm, maybe the only exception is when we do a cluster restart from a slot in the middle of the window. In that case do we set up ParentReady and timeouts correctly now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes cluster restart should be fine, we kick off a local parent ready when the node restarts.

my concern was could the first_slot_of_window_bank in our bank_forks be for a different version than the ancestor of block_id. I don't think this is possible though, just wanted to point it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so

events.push(VotorEvent::ParentReady {

says we can trigger ParentReady from any slot (not just first slot of the window)

Should we trigger a ParentReady from finalized slot as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, I don't think set_timeouts correctly handles setting timeouts from middle of a window?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AshwinSekar I changed ParentReady to trigger on the finalized block instead of the first slot of the window, to be consistent with restarting on a slot in the middle, what do you think? Tests look happy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I missed this, I believe this should be fine. The key is that we'll be able to vote skip and notarize-fallback if needed.

@wen-coding
Copy link
Contributor Author

Echo-ing carl's comments I believe it's important to check the failure cases on state being cleaned up too early.

I think we need to ensure that we don't root bank forks until we're sure that we don't need to mark a local parent ready, probably easiest to enforce that we've emitted a ParentReady for the window, or simply change the order of the two checks.

The other place I was looking at was should_ignore, but I think that should also be fine if we change the order.

Sorry I'm confused, which order should I change here?

@AshwinSekar
Copy link
Contributor

Echo-ing carl's comments I believe it's important to check the failure cases on state being cleaned up too early.
I think we need to ensure that we don't root bank forks until we're sure that we don't need to mark a local parent ready, probably easiest to enforce that we've emitted a ParentReady for the window, or simply change the order of the two checks.
The other place I was looking at was should_ignore, but I think that should also be fine if we change the order.

Sorry I'm confused, which order should I change here?

I was mistaken I think this is fine since as you pointed out we only cleanup after we have voted on the slot. Since the first leader slot requires a parent ready to vote, and the slots in the window require us to vote on the previous slot, i don't think it's possible that we need to get into a local trigger scenario but have rooted something in the window.

/// While B is stuck because it is waiting for >60% of the votes to finalize slot 9.
/// The cluster will get stuck.
/// After we add the following function, C will see that block 9 is finalized yet
/// it never had parent ready for slot 8, so it will find the block id of 8 from
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if C started from a snapshot of 9 and has thus rooted 9, and doesn't have 8 in its blockstore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See Ashwin's explanation above: "yes cluster restart should be fine, we kick off a local parent ready when the node restarts."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

events.push(VotorEvent::ParentReady {

Copy link
Contributor

@carllin carllin Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah makes sense thanks

@wen-coding wen-coding requested a review from carllin August 21, 2025 21:31
// We haven't finished replay for the block, so we can't trigger parent ready
return None;
}
if bank.block_id() != Some(block_id) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I missed this, I believe this should be fine. The key is that we'll be able to vote skip and notarize-fallback if needed.

return None;
};
info!(
"{}: Triggering parent ready for missing first slot of window {slot} \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit no longer first slot of window, maybe better to just say local trigger.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

}
// If the block is missing, we can't trigger parent ready
let bank_forks_r = ctx.bank_forks.read().unwrap();
let bank = bank_forks_r.get(slot)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: get already clones the bank so you can just do
let bank = ctx.bank_forks.read().unwrap().get(slot)?;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

carllin
carllin previously approved these changes Aug 22, 2025
Copy link
Contributor

@carllin carllin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for thinking through the nuances

// No need to trigger parent ready for the first slot of the window
return None;
}
if vctx.vote_history.highest_parent_ready_slot() >= Some(first_slot_of_window)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is safe because if a slot in this window got finalized, there cannot exist a skip certificate for this window, so there's no other fork that can have a higher parent ready right

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could also be that the cluster finalized this slot and the slots after them, but you are replaying blocks super slowly so you just got to this slot.

In any case, I think if we have ParentReady for a future slot in a future window, we shouldn't care about the current slot window any more, best strategy is fast forward to that future window.

/// cluster. If we get a finalization cert for later slots of the window and we have the
/// block replayed, trace back to the first slot of the window and emit parent ready.
fn add_missing_parent_ready(
block: Block,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename to finalized_block to emphasize this must be finalzied

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

received_shred,
stats,
)?;
if let Some((slot, block)) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: (ready_slot, parent_block)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

AshwinSekar
AshwinSekar previously approved these changes Aug 22, 2025
Copy link
Contributor

@AshwinSekar AshwinSekar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@wen-coding wen-coding merged commit be47612 into anza-xyz:master Aug 22, 2025
7 checks passed
@wen-coding wen-coding deleted the local_trigger_for_parent_ready branch August 22, 2025 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add local trigger for ParentReady

3 participants