-
Couldn't load subscription status.
- Fork 29
BLS cert all to all. #300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BLS cert all to all. #300
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks great! just a couple of comments
Co-authored-by: Ashwin Sekar <[email protected]>
Co-authored-by: Ashwin Sekar <[email protected]>
…ow into bls_cert_all_to_all_2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, @carllin any other comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Left a few comments
| Self { t_ingest } | ||
| } | ||
|
|
||
| fn maybe_update_root_and_send_new_certificates( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this go in root_utils.rs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the sending certificates part should probably stay here.
We could move this part to root_utils.rs.
// Reset standstill timer
debug_assert!(new_finalized_slot > *highest_finalized_slot);
*highest_finalized_slot = new_finalized_slot;
*standstill_timer = Instant::now();
// Set root
let root_bank = root_bank_cache.root_bank();
if root_bank.slot() > *current_root {
*current_root = root_bank.slot();
cert_pool.handle_new_root(root_bank);
}
Maybe cleaner to create a struct with all the mutables from the loop to pass around instead of all the arguments as well:
&mut current_root,
&mut highest_finalized_slot,
&mut standstill_timer,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case I think the standstill_timer maybe doesn't belong here either....
Since this is code refactoring, can this change maybe go into another PR instead of piggying back on this BLS cert all to all change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay w/ doing it in a follow-up 👍
| } | ||
| // Send new certificates to peers | ||
| for certificate in new_certificates_to_send { | ||
| // bls_sender is an unbounded channel, so error only happens if the receiver is disconnected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we bound this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is verified and forwarded by our own certificate pool before getting here, and we should only send once per (slot, cert_type), I think it's okay if we don't bound the channel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we bound, we can pre-allocate a ring buffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm okay, that's fair. Although I probably won't build certificate resend here, I'll just drop and log error, since in normal cases the buffer shouldn't overflow.
The BLSOps to send out should be biggest during standstill handling. Currently we are saying after 10 seconds standstill, send out all own votes and certificates, since 10/0.4 = 25, and we can have <= 5 votes and <= 5 certs per slot, I'll cap the channel at 512.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
512 sounds reasonable to me. Might be nice to log a drop metric as well (easier to query at scale and see if we have a bug)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Err currently we have no stats in certificate_pool_service at all. Can I do that in a follow up PR to properly add stats for drops and everything else?
| let (bls_sender, bls_receiver) = unbounded(); | ||
| // The BLS sender channel should be mostly used during standstill handling, | ||
| // there could be 10s/400ms = 25 slots, <=5 votes and <=5 certificates per slot, | ||
| // we cap the channel at 512 to give some headroom. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the comment!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the 25 slots is not an upper bound though, only what we would expect if the network recovered in the meantime (if e.g. we have a network partition for >10s this does not hold)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if the network is in a partition where everyone is in a partition with <60% stake, then we shouldn't have many votes and certs constructed during the partition? If one partition is > 60% stake, then others should be able to observe new finalized cert and continue during recovery.
Co-authored-by: Ashwin Sekar <[email protected]>
Co-authored-by: Ashwin Sekar <[email protected]>
Co-authored-by: Ashwin Sekar <[email protected]>
Co-authored-by: Ashwin Sekar <[email protected]>
Co-authored-by: Ashwin Sekar <[email protected]>
Co-authored-by: Ashwin Sekar <[email protected]>
Co-authored-by: Ashwin Sekar <[email protected]>
Broadcast BLS cert all-to-all.