Skip to content

Conversation

@lamafab
Copy link
Member

@lamafab lamafab commented Mar 7, 2025

This PR introduces a standalone ActivationManager which implements our proposed Network Upgrade specification.

Early feedback on the specification and enforcement logic is appreciated.


EDIT (2025-03-20): Comments and recommendations have been addressed, ready to merge!

Changes:

  • Renamed threshold to approval_rate
  • Renamed accepting to is_compliant
  • Wrapped major/minor version indicators in new-types
    • respectively: RuntimeVersion { MajorVersion(u16), MinorVersion(u16) }
  • Adjusted approval rate formula calculations; instead of checking min_validator_count independently, it's now part of calculation itself. Respectively:
        let total = votes.len().max(min_validator_count);
        let votes_received = votes.iter().filter(|(_, e)| e.vote == Vote::Aye).count();

        // Calculate percentage (0-100) of votes received
        let quorum = (votes_received * 100).div_ceil(total);

This means that if we for example require at least three votes and we have one Aye vote, we calculate approval_vote = 34 and can just check approval_vote >= 67 - instead of having to do approval_vote >= 67 && votes_len >= 3.

EDIT (2025-03-17): PR is out of Draft and now ready for review!

Unit test coverage -> #611 (comment)

EDIT (2025-03-14): Completed comprehensive unit test suite:

  • Case: Vote Aye and wait for acceptance threshold
  • Case: Vote Aye but votes expire after sampling period; upgrade does not pass
  • Case: Vote Nay but still accept upgrade when conditions are met
  • Case: Vote Nay and refuse to upgrade (explicit fork)
  • Case: Ignore upgrade completely (implicit fork)
  • Case: Majority votes Nay but is willing to accept; upgrade does not pass
  • Case: Properly handle version transitions at exact target height
  • Case: Reject outdated version
  • Case: Reject unsupported (future) version
  • Case: Upgrade never passes due to insufficient conditions
  • Case: Lagging validator can catch-up (sync) with finalized upgrade
  • Case: Validators ignore votes on mismatched upgrade versions

Migrated and expanded all test cases using the improved testing structures! I'll add more documentation and do some final cleanups, and then we're done. I will convert this from a Draft to a PR when appropriate.

@arminsabouri arminsabouri mentioned this pull request Mar 14, 2025
@lamafab lamafab force-pushed the network-upgrades-part-1 branch 2 times, most recently from 1f07d2f to 0e9a1ba Compare March 17, 2025 14:40
@lamafab lamafab marked this pull request as ready for review March 17, 2025 14:41
@lamafab
Copy link
Member Author

lamafab commented Mar 17, 2025

Unit test coverage:

accept_lagging_validator.rs

/// Tests that a lagging validator can join the network after an upgrade has occurred.
///
/// This test verifies that:
/// 1. Two validators (ALICE, BOB) can successfully activate an upgrade when they meet all conditions
/// 2. A third validator (EVE) who joins later and is configured to accept the upgrade can 
///    successfully process blocks with the new version
/// 3. The lagging validator correctly rejects upgraded blocks during the process_proposal phase
///    but accepts them during finalize_block phase
/// 4. After finalizing the first upgraded block, the lagging validator properly updates its
///    active version and can fully participate in the network

reject_ignored_upgrade.rs

/// Tests that a validator can reject an upgrade that it ignores.
///
/// This test verifies that:
/// 1. Two validators (ALICE, BOB) vote for and are willing to accept an upgrade
/// 2. A third validator (EVE) ignores the upgrade entirely
/// 3. When conditions are met (2 validators required, height reached), ALICE proposes upgraded blocks
/// 4. EVE consistently rejects these upgraded blocks in both the process_proposal and finalize_block phases
/// 5. This leads to a consensus split where ALICE and BOB operate on the upgraded chain
///    while EVE rejects those blocks as dead ends

reject_mismatched_vote.rs

/// Tests that validators tracking different upgrade versions don't count each other's votes.
///
/// This test verifies that:
/// 1. Two validators (ALICE, BOB) vote for and track upgrade version 2.0
/// 2. A third validator (EVE) votes for a different upgrade version (3.0)
/// 3. The validators only count votes for their specific tracked version
/// 4. Even though all validators are voting "Aye", the upgrade doesn't activate because
///    from each perspective, the minimum validator requirement is not met
/// 5. This protects the network from confusion when multiple potential upgrades are being discussed

reject_outdated_or_unsupported_version.rs

/// Tests that validators reject blocks with outdated or unsupported versions.
///
/// This test verifies that:
/// 1. All validators successfully activate an upgrade to version 2.0
/// 2. EVE is then reset to use the previous version (1.0)
/// 3. When EVE proposes blocks with the outdated version:
///    - ALICE and BOB reject these blocks during process_proposal but accept them during finalize_block
///    - This behavior allows historical sync while preventing outdated block production
/// 4. When ALICE proposes blocks with the upgraded version:
///    - EVE rejects these blocks during both process_proposal and finalize_block
///    - This creates a consensus split where EVE cannot follow the upgraded chain

vote_expiration.rs

/// Tests that votes expire after the retention period and are no longer counted.
///
/// This test verifies that:
/// 1. All validators vote for and are willing to accept an upgrade
/// 2. With a reduced vote retention period of 10 blocks, votes begin to expire
/// 3. By block 12, BOB's vote (cast in block 1) expires and is no longer counted
/// 4. By block 13, EVE's vote (cast in block 2) also expires
/// 5. Even when the target height is reached, the upgrade doesn't activate because
///    expired votes reduce the validator count below the minimum requirement
/// 6. This ensures that upgrade decisions reflect recent consensus rather than outdated votes

vote_majority_nay_accepting_and_reject.rs

/// Tests that an upgrade is rejected when most validators vote Nay despite accepting.
///
/// This test verifies that:
/// 1. All validators are configured to accept an upgrade (accepting = true)
/// 2. ALICE votes Aye, but BOB and EVE vote Nay
/// 3. Despite having 100% of validators accepting the upgrade, it does not activate
///    because the Aye vote threshold (only 34%) is below the required quorum
/// 4. This demonstrates that both acceptance and explicit Aye votes are required
///    for an upgrade to activate

vote_nay_and_accept.rs

/// Tests that an upgrade activates even with a validator voting Nay but accepting.
///
/// This test verifies that:
/// 1. Two validators (ALICE, BOB) vote Aye for an upgrade
/// 2. One validator (EVE) votes Nay but is configured to accept the upgrade
/// 3. The upgrade still activates because:
///    - All validators are accepting the upgrade (100% acceptance)
///    - Aye votes reach 67%, meeting the quorum requirement
/// 4. After activation, all validators, including EVE, process blocks with the new version
/// 5. This shows that validators can signal disagreement while still accepting majority decisions

vote_nay_and_reject.rs

/// Tests that a validator can vote Nay and reject an upgrade.
///
/// This test verifies that:
/// 1. Two validators (ALICE, BOB) vote Aye and are willing to accept an upgrade
/// 2. One validator (EVE) votes Nay and is configured to reject the upgrade (accepting = false)
/// 3. When conditions are met, ALICE proposes upgraded blocks
/// 4. EVE consistently rejects these upgraded blocks in both process_proposal and finalize_block phases
/// 5. This leads to a consensus split where ALICE and BOB operate on the upgraded chain
///    while EVE rejects those blocks as dead ends

wait_for_accepting.rs

/// Tests the upgrade flow where validators first signal support, then later accept.
///
/// This test verifies that:
/// 1. All validators initially signal Aye votes but are not ready to accept the upgrade
/// 2. Despite unanimous Aye votes, the upgrade doesn't activate because no validators are accepting it
/// 3. After all validators update to accept the upgrade and additional voting occurs:
///    - By block 5, all conditions are met (100% Aye votes, 100% acceptance, target height reached)
///    - EVE proposes the first upgraded block which is accepted by all
/// 4. This demonstrates the two-phase upgrade process where validators can signal support
///    before they're technically ready to handle the upgrade

@lamafab lamafab force-pushed the network-upgrades-part-1 branch from 1bab3f0 to 9fb7815 Compare March 18, 2025 14:46
Copy link
Contributor

@rwlockbg rwlockbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR - so far this is the state machine logic basically and the underlying mechanism for accepting an upgrade. I presume next PRs will be actual integration. Left some comments, but overall looking nice and nice tests too!

@scottmillner
Copy link
Collaborator

scottmillner commented Mar 19, 2025

love the test fixture and test helper methods!! 🔥 🔥 🔥
awesome work with the very well defined spec and it's implementation

putting my approval but little nit: do prefer @rwlockbg suggestion below

pub enum RuntimeVersion {  Major(u16), Minor(u16) }

scottmillner
scottmillner previously approved these changes Mar 19, 2025
@arminsabouri
Copy link
Contributor

@lamafab Do you mind squashing relevant commits (typos, lint, trigger CI). Ideally these changes are made into the commits that introduce them

@lamafab
Copy link
Member Author

lamafab commented Mar 20, 2025

@0xBEEFCAF3

@lamafab Do you mind squashing relevant commits (typos, lint, trigger CI). Ideally these changes are made into the commits that introduce them

I'll probably need to rewrite the entire history anyway after following up on Satwiks' recommendations - I"ll cherry-pick all changes and label them accordingly, then force-push. ETA EOD.

@lamafab
Copy link
Member Author

lamafab commented Mar 20, 2025

Regarding: https://github.com/botanix-labs/Macbeth/actions/runs/13976740327/job/39132293647?pr=611

TRY 1 TRMNTG [>120.000s] reth-authority-consensus comet_bft::abci::tests::test_finalize_block_with_signed_tx
   TRY 1 TMT [ 120.020s] reth-authority-consensus comet_bft::abci::tests::test_finalize_block_with_signed_tx
──── TRY 1 STDOUT:       reth-authority-consensus comet_bft::abci::tests::test_finalize_block_with_signed_tx

running 1 test
test comet_bft::abci::tests::test_finalize_block_with_signed_tx has been running for over 60 seconds
──── TRY 1 STDERR:       reth-authority-consensus comet_bft::abci::tests::test_finalize_block_with_signed_tx

thread 'comet_bft::abci::tests::test_finalize_block_with_signed_tx' panicked at crates/consensus/authority/src/comet_bft/abci.rs:1351:31:
Error building block in finalize block: Validation(EVM { hash: 0xb222209cbdf13b5574f4fb7ae7af4f0bf0abd614bf4c7de67b9ecfa117aca874, error: Transaction(InvalidChainId) })
stack backtrace:

I didn't do any changes in this area, and when running the command locally all tests pass:

cargo nextest run \
    --locked --features "ethereum conflicting_input" \
    --workspace --exclude example-custom-evm --exclude example-stateful-precompile --exclude ef-tests \
    -E "kind(lib) | kind(bin) | kind(proc-macro)"

I'm just going to re-run.

@scottmillner scottmillner self-requested a review March 20, 2025 22:58
rwlockbg
rwlockbg previously approved these changes Mar 21, 2025
Copy link
Contributor

@rwlockbg rwlockbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am approving it but you need to get hte pipelines green.

@SatwikPrabhu
Copy link
Contributor

@SatwikPrabhu SatwikPrabhu self-requested a review March 21, 2025 17:48
SatwikPrabhu
SatwikPrabhu previously approved these changes Mar 21, 2025
@lamafab lamafab dismissed stale reviews from SatwikPrabhu and rwlockbg via 220f199 March 21, 2025 23:44
@lamafab
Copy link
Member Author

lamafab commented Mar 21, 2025

I just renamed a some fields in ConditionList, minor cosmetic changes.

Copy link
Collaborator

@scottmillner scottmillner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was able to get all the int tests to pass locally
so when there's another approval happy for it to go in

@lamafab lamafab merged commit 035eb53 into main Mar 22, 2025
14 of 18 checks passed
@scottmillner scottmillner deleted the network-upgrades-part-1 branch March 22, 2025 19:59
@lamafab
Copy link
Member Author

lamafab commented Dec 11, 2025

Botanix Network Upgrade Specification

(This spec has been revised with Claude.AI)

1. Introduction

This document outlines the specification for the Network Upgrade mechanism in the Botanix ecosystem. This mechanism enables coordinated updates to the consensus rules and network protocol while preserving blockchain safety and minimizing the risk of chain splits.

2. Terminology

  • Active Version: The currently enforced runtime version.
  • Upgrade Version: A proposed new runtime version that may become active after meeting required conditions.
  • Runtime Version: A pair of values (HARD, SOFT) where HARD represents breaking changes and SOFT represents backward-compatible changes.
  • Vote: A validator's explicit signal regarding a proposed upgrade (Aye, Nay, or Absent).
  • Compliance: A validator's readiness to process blocks with the upgrade version when conditions are met.
  • Quorum: The percentage of validators that must support an upgrade for it to be activated.
  • Target Height: The minimum block height at which an upgrade can activate if quorum is reached.

3. Components

3.1 ActivationManager

The ActivationManager MUST track network upgrade proposals, calculate approval rates, and determine when proposed upgrades should become active. It MUST interface with the ABCI component during block production and validation.

3.2 Network Upgrade Payload

Each block proposal MAY include a NetworkUpgradePayload within its Non-Deterministic Data (NDD) transaction, containing:

  • version: The proposed Runtime Version
  • vote: The block proposer's vote (Aye/Nay/Absent)
  • is_compliant: Whether the proposer is ready to accept the upgrade

4. Configuration

4.1 Constants

The system defines the following global constants:

  • MIN_QUORUM: 67% - The minimum acceptance approval rate required for an upgrade
  • MIN_VALIDATOR_COUNT: 3 - The minimum number of distinct validators that must participate in voting
  • VOTE_RETENTION_PERIOD: 518,400 blocks - The period for which votes are retained (approximately 30 days at 12 blocks per minute)

These global constants ensure network security by requiring a supermajority consensus across a minimum number of validators before any upgrade can proceed, while maintaining a reasonable voting window.

4.2 Default Configuration

By default and for the majority of a node's lifetime, the ActivationManager MUST be built with:

  • build_ignore_network_upgrade(): Continue with current version, ignore all upgrade proposals

This configuration:

  • Does not participate in any upgrade voting process
  • Leaves the NetworkUpgradePayload in the NDD empty
  • MUST reject any blocks with upgraded versions
  • Results in a consensus split if an upgrade activates on the network
  • Node operators SHOULD be aware that continuing operation after a network upgrade without upgrading results in undefined network behavior

4.3 Upgrade Signaling

When a community-driven upgrade proposal is being discussed, validators MAY signal their intentions by configuring the ActivationManager using:

  • build_signal_network_upgrade(upgrade_version, vote): Signal vote without accepting the upgrade

Key characteristics:

  • This configuration SHOULD be possible without requiring a software update
  • The runtime version and vote parameters SHOULD be configurable via CLI flags, environment variables, or configuration files
  • The NetworkUpgradePayload in the NDD does include:
    • The specified upgrade_version
    • The validator's vote
    • The is_compliant parameter set to false, indicating readiness to signal but not to process upgraded blocks
  • Blocks proposing the upgrade version MUST be rejected even if quorum is reached

4.4 Upgrade Acceptance

If an upgrade proposal reaches community consensus and is scheduled for activation, the maintainers MUST release a new Botanix node version with the ActivationManager using:

  • build_ACCEPT_network_upgrade(upgrade_version, quorum, target_height, vote): Signal vote and accept the upgrade if conditions are met

Key characteristics:

  • The release MUST include all necessary logic, code changes, and migrations required to fully support the new runtime version.
  • The upgrade parameters MUST be hardcoded in this release to ensure consistency.
  • Sets NetworkUpgradePayload with both the validator's vote AND is_compliant=true
  • Enables the validator to process blocks with the upgraded version once all upgrade conditions are met
  • Automatically transitions the node to the new version once the first upgraded block is finalized
  • Prunes voting data after the transition to start fresh with the new version

Validators MUST update to this new version before all upgrade conditions are met to participate in the coordinated upgrade process.

5. Upgrade Conditions

Blocks with the upgraded version MUST only be proposed and backed if ALL of the following are true:

  • The validator is configured to accept the upgrade
    • Config parameter: is_compliant
  • The network has reached sufficient quorum of Aye votes for the upgrade
    • Config parameter: quorum
    • Must be greater than or equal to MIN_QUORUM
  • The network has reached sufficient quorum of compliant validators for the upgrade
    • Config parameter: quorum
    • Must be greater than or equal to MIN_QUORUM
  • The current block height is at or above the scheduled target height
    • Config parameter: target_height

5.1 Approval Rate Calculation

The ActivationManager maintains two separate approval rate calculations:

  1. Aye Approval Rate: Percentage of validators voting Aye, regardless of compliance status.
  2. Compliance Approval Rate: Percentage of validators with is_compliant=true, regardless of vote.

By requiring both approval rates to meet quorum, we ensure upgrades only proceed when there is both sufficient community support AND operational readiness. This allows validators to signal support for an upgrade they aren't yet ready to implement, or to prepare for an upgrade they oppose (but will follow if passed by the community).

For the calculation, the following values must be retrieved:

$$ \begin{align*} V &= \set{v_1, v_2, \ldots, v_n} \text{ the set of all votes} \\ Y &= \set{v \in V \mid v \text{ is \textit{Aye}}} \text{ the subset of Aye votes} \\ C &= \set{v \in V \mid v \text{ is \textit{compliant}}} \text{ the subset of \textit{compliant} votes} \\ y &= |Y| \text{ the number of \textit{Aye} votes} \\ c &= |C| \text{ the number of \textit{compliant} validators} \\ n &= |V| \text{ the total number of votes} \\ m &= \texttt{MIN-VALIDATOR-COUNT} \text{ the minimum required validator count} \\ T &= \max(n, m) \text{ the effective total for percentage calculation} \end{align*} $$

The exact formula for calculating the aye approval rate is:

$$ \begin{equation} t^{y} = \left\lceil \frac{y \times 100}{T} \right\rceil \end{equation} $$

The exact formula for calculating the compliance approval rate is:

$$ \begin{equation} t^{c} = \left\lceil \frac{c \times 100}{T} \right\rceil \end{equation} $$

The quorum is considered met if and only if:

  1. $t^{y} \geq {}$ quorum - The percentage of Aye votes meets or exceeds the minimum required threshold
  2. $t^{c} \geq {}$ quorum - The percentage of compliant validators meets or exceeds the minimum required threshold

The ceiling function $\lceil \cdot \rceil$ in the formula ensures we round up to the nearest percentage point, implementing ceiling integer division. Using $\max(n, m)$ as the denominator ensures that when there are fewer validators voting than the minimum required count, we calculate the percentage against the minimum count, making it more difficult to reach quorum with only a few validators.

6. Process Flow for Compliant Validators

This process flow describe how ActivationManager must be implemented into CometBFT's block production and finalization methods.

6.2 Block Preparation

  1. During CometBFTs' prepare_proposal, an ActivationManager MUST:
    • Return the appropriate block version to propose (active or upgrade)
    • Include the proposer's vote on pending upgrades, if any.
    • Propose an upgraded block only when all upgrade conditions are met.

6.3 Block Validation

  1. During CometBFTs' process_proposal, all validators MUST:

    • Accept blocks with the active version
    • Accept blocks with an upgrade version only when all upgrade conditions are met.
    • Reject blocks with unknown versions or when all upgrade conditions are not met.
  2. Validators configured to reject an upgrade MUST NOT process blocks with the upgraded version, regardless of network support levels.

6.4 Block Finalization

  1. During CometBFTs' finalize_block, validators MUST:

    • Track the proposer's vote for network upgrades
    • Prune outdated votes by removing all votes from blocks older than (current_height - VOTE_RETENTION_PERIOD)
    • Accept any block version that is less than or equal to the active version
      • This mechanism is required for historical sync
    • Update their active version if an upgraded block is finalized
    • Reject blocks if explicitly configured to reject that upgrade
    • Clean up all voting data once a version transition occurs or when the upgrade block is rejected
  2. When a validator finalizes a block with an upgraded version, it MUST set that version as the new active version for all future block production and validation.

  3. Validators not accepting an upgrade MUST deliberately reject upgraded blocks, resulting in a consensus split and undefined network behavior.

6.4.1 Syncing Considerations

The on_finalize_block method MUST NOT repeat upgrade condition checks for the following reasons:

  • Syncing nodes lack historical context about previous network state and voting patterns
  • Blocks finalized by CometBFT consensus have already been endorsed by a supermajority of validators
  • By the time a block reaches finalization, its validity including version transitions has been confirmed
  • Re-validating these conditions during sync would require maintaining and processing historical vote data that may no longer be available, and would introduce additional complexity

This separation of responsibilities ensures that nodes can reliably sync the canonical chain while maintaining the ability to reject future unwanted upgrades.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants