Skip to content

Conversation

dknopik
Copy link
Member

@dknopik dknopik commented May 12, 2025

This PR fixes proposals! The commits fix several issues, I am not sure which are strictly necessary to fix proposals, but each is an improvement anyway.

Decode DataSsz variants explicitly based on duty and fork: Currently, Fulu and Electra blocks are identical when serialized in SSZ. Therefore, when we deserialize/deserialise the block, from_ssz_bytes accidentally decodes the block into the Fulu variant, causing us to send a Fulu block to the BN, which in turn refuses to publish it. The commit explicitly decodes the block using the version field from the decided data and similarly decodes other objects directly as well in other duties (to be safe).

Set max_transmit_size: Some messages with large blocks failed to be sent as they are quite large - we set max_transmit_size to a large enough value of 5 million bytes.

Outdated, see "Improve fix for round timeouts" below - Avoid conflict between timer-based and message-based round increments: If we receive a quorum of round change messages, we set our own round to the round mentioned in these messages. However, if a timer tick arrives after, it increased the round regardless - even though actually it meant to increase the round to the round found in those messages. We fix this by tracking the round requested by ticks separately.

Remember PREPARE even if we have not received the PROPOSE yet: We discarded PREPARE messages if we have not received a PROPOSE message yet - which in some cases resulted in missed quorums. We record unknown prepare messages, but still only check for quorum if we have the PROPOSE message.

Use current slot for RANDAO Reveal: We need to use the current slot instead of the last slot of the epoch for go-ssv to pick up the messages correctly.

Address review and fix proposal data: While the logic was correct for blinded block and pre-Deneb full blocks, it was not correct for Deneb and Electra blocks as blobs need to decided along with the block. This commit sets the data appropriately - but this requires a small change in LH which is pending approval: sigp/lighthouse#7497.

Improve fix for round timeouts: Round changes might be triggered by messages instead of timer ticks. In order to avoid incorrect timeouts after a round has been changed by a message (e.g. after replaying buffered messages), we calculate the currently adequate timeout every time before we wait for a message or round timeout.

@dknopik dknopik added the bug Something isn't working label May 12, 2025
@dknopik dknopik marked this pull request as ready for review May 13, 2025 08:29
Copy link

cla-assistant bot commented May 13, 2025

CLA assistant check
All committers have signed the CLA.

@diegomrsantos diegomrsantos requested a review from Copilot May 13, 2025 10:48
Copilot

This comment was marked as resolved.

Copy link
Member

@Zacholme7 Zacholme7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Diego covered pretty much everything. Also just seconded some small terminology clarification on my end.

@diegomrsantos
Copy link
Contributor

@dknopik, can we add tests to be sure (and avoid regressions) that there's no race related to the round timeout?

@diegomrsantos diegomrsantos requested a review from Copilot May 23, 2025 19:49
Copilot

This comment was marked as resolved.

@dknopik
Copy link
Member Author

dknopik commented May 23, 2025

@dknopik, can we add tests to be sure (and avoid regressions) that there's no race related to the round timeout?

Sure. Do you want me to do so? If you want, you can take a stab at it instead - that might prevent issues like me "overfitting" the test on my mental model of the issue.

@dknopik dknopik added v0.1.0 First Testnet-only release and removed do-not-merge labels May 27, 2025
@diegomrsantos
Copy link
Contributor

@dknopik, can we add tests to be sure (and avoid regressions) that there's no race related to the round timeout?

Sure. Do you want me to do so? If you want, you can take a stab at it instead - that might prevent issues like me "overfitting" the test on my mental model of the issue.

Feel free to do it.

diegomrsantos
diegomrsantos previously approved these changes May 28, 2025
Copy link
Contributor

@diegomrsantos diegomrsantos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work on this one, it's really great.

@dknopik
Copy link
Member Author

dknopik commented May 28, 2025

@diegomrsantos thanks for the idea about a test - I found another minor data race thanks to it.

We removed the message round check in the qbft crate as we now check for this in message validation. However, as the message might be handled by the qbft crate after the round has advanced, this caused an interesting issue where a message belonging to a previous round change arrived and caused an instance to revert to a previous round.

I reintroduced the round check.

@dknopik dknopik mentioned this pull request May 28, 2025
@diegomrsantos diegomrsantos requested a review from Copilot May 28, 2025 16:48
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors block proposal signing, improves round timeout handling, increases network message size limits, and updates SSZ version encoding.

  • Introduces a SignableBlock trait and rewrites decide_abstract_block/sign_abstract_block to handle fork-aware SSZ and unified unsigned/signed block types.
  • Enhances QBFT manager to compute timeouts dynamically, adds tests for late-initialization scenarios, and removes static round_end.
  • Adds a 5 MiB max_transmit_size, replaces raw DataVersion constants with a ForkName-wrapped type, and updates SSZ encode/decode accordingly.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
validator_store/src/lib.rs Refactored block decision/signing with SignableBlock trait
qbft_manager/src/tests.rs Added test timeout, late-initialization parameter and test
qbft_manager/src/instance.rs Removed static round_end, compute timeout dynamically
network/src/network.rs Set max_transmit_size to 5 000 000 bytes
common/ssv_types/src/consensus.rs Replaced DataVersion/DataSsz with ForkName-based wrapper
common/qbft/src/lib.rs Adjusted get_round return type and added round-check in message handling
Comments suppressed due to low confidence (3)

anchor/validator_store/src/lib.rs:388

  • [nitpick] The parameter name signable_block and local variable block are reused and shadowed by subsequent let block = ... bindings, which can be confusing. Consider renaming the parameter to proposal or payload and the result variable to unsigned_block for clarity.
signable_block: impl SignableBlock<E>,

anchor/qbft_manager/src/tests.rs:129

  • [nitpick] This test panics on timeout, which aborts the suite. Consider using assert!(false, "test timed out") or emitting a clearer failure message so that test harnesses report it as a failed assertion rather than an uncaught panic.
let timeout = sleep(TEST_TIMEOUT);

anchor/validator_store/src/lib.rs:413

  • Calling .into() on &ForkName may not resolve the intended DataVersion conversion if only From<ForkName> is implemented. Consider explicitly cloning or copying the ForkName (e.g., block.fork_name_unchecked().clone().into()) to avoid type inference issues.
let block_version = block.fork_name_unchecked().into();

@dknopik dknopik merged commit 81f5e1e into sigp:unstable May 29, 2025
12 checks passed
@dknopik dknopik deleted the proposal-fixes branch June 20, 2025 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working network QBFT v0.1.0 First Testnet-only release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants