-
Notifications
You must be signed in to change notification settings - Fork 21
Proposal fixes #310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal fixes #310
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Diego covered pretty much everything. Also just seconded some small terminology clarification on my end.
@dknopik, can we add tests to be sure (and avoid regressions) that there's no race related to the round timeout? |
Sure. Do you want me to do so? If you want, you can take a stab at it instead - that might prevent issues like me "overfitting" the test on my mental model of the issue. |
Feel free to do it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work on this one, it's really great.
@diegomrsantos thanks for the idea about a test - I found another minor data race thanks to it. We removed the message round check in the qbft crate as we now check for this in message validation. However, as the message might be handled by the qbft crate after the round has advanced, this caused an interesting issue where a message belonging to a previous round change arrived and caused an instance to revert to a previous round. I reintroduced the round check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors block proposal signing, improves round timeout handling, increases network message size limits, and updates SSZ version encoding.
- Introduces a
SignableBlock
trait and rewritesdecide_abstract_block
/sign_abstract_block
to handle fork-aware SSZ and unified unsigned/signed block types. - Enhances QBFT manager to compute timeouts dynamically, adds tests for late-initialization scenarios, and removes static
round_end
. - Adds a 5 MiB
max_transmit_size
, replaces rawDataVersion
constants with aForkName
-wrapped type, and updates SSZ encode/decode accordingly.
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
validator_store/src/lib.rs | Refactored block decision/signing with SignableBlock trait |
qbft_manager/src/tests.rs | Added test timeout, late-initialization parameter and test |
qbft_manager/src/instance.rs | Removed static round_end , compute timeout dynamically |
network/src/network.rs | Set max_transmit_size to 5 000 000 bytes |
common/ssv_types/src/consensus.rs | Replaced DataVersion /DataSsz with ForkName -based wrapper |
common/qbft/src/lib.rs | Adjusted get_round return type and added round-check in message handling |
Comments suppressed due to low confidence (3)
anchor/validator_store/src/lib.rs:388
- [nitpick] The parameter name
signable_block
and local variableblock
are reused and shadowed by subsequentlet block = ...
bindings, which can be confusing. Consider renaming the parameter toproposal
orpayload
and the result variable tounsigned_block
for clarity.
signable_block: impl SignableBlock<E>,
anchor/qbft_manager/src/tests.rs:129
- [nitpick] This test panics on timeout, which aborts the suite. Consider using
assert!(false, "test timed out")
or emitting a clearer failure message so that test harnesses report it as a failed assertion rather than an uncaught panic.
let timeout = sleep(TEST_TIMEOUT);
anchor/validator_store/src/lib.rs:413
- Calling
.into()
on&ForkName
may not resolve the intendedDataVersion
conversion if onlyFrom<ForkName>
is implemented. Consider explicitly cloning or copying theForkName
(e.g.,block.fork_name_unchecked().clone().into()
) to avoid type inference issues.
let block_version = block.fork_name_unchecked().into();
This PR fixes proposals! The commits fix several issues, I am not sure which are strictly necessary to fix proposals, but each is an improvement anyway.
Decode
DataSsz
variants explicitly based on duty and fork: Currently, Fulu and Electra blocks are identical when serialized in SSZ. Therefore, when we deserialize/deserialise the block,from_ssz_bytes
accidentally decodes the block into theFulu
variant, causing us to send a Fulu block to the BN, which in turn refuses to publish it. The commit explicitly decodes the block using theversion
field from the decided data and similarly decodes other objects directly as well in other duties (to be safe).Set
max_transmit_size
: Some messages with large blocks failed to be sent as they are quite large - we setmax_transmit_size
to a large enough value of 5 million bytes.Outdated, see "Improve fix for round timeouts" below - Avoid conflict between timer-based and message-based round increments: If we receive a quorum of round change messages, we set our own round to the round mentioned in these messages. However, if a timer tick arrives after, it increased the round regardless - even though actually it meant to increase the round to the round found in those messages. We fix this by tracking the round requested by ticks separately.
Remember PREPARE even if we have not received the PROPOSE yet: We discarded PREPARE messages if we have not received a PROPOSE message yet - which in some cases resulted in missed quorums. We record unknown prepare messages, but still only check for quorum if we have the PROPOSE message.
Use current slot for RANDAO Reveal: We need to use the current slot instead of the last slot of the epoch for go-ssv to pick up the messages correctly.
Address review and fix proposal data: While the logic was correct for blinded block and pre-Deneb full blocks, it was not correct for Deneb and Electra blocks as blobs need to decided along with the block. This commit sets the data appropriately - but this requires a small change in LH which is pending approval: sigp/lighthouse#7497.
Improve fix for round timeouts: Round changes might be triggered by messages instead of timer ticks. In order to avoid incorrect timeouts after a round has been changed by a message (e.g. after replaying buffered messages), we calculate the currently adequate timeout every time before we wait for a message or round timeout.