Skip to content

Conversation

jimmygchen
Copy link
Member

@jimmygchen jimmygchen commented Mar 25, 2025

Issue Addressed

This PR adds transitions to Electra and Fulu fork epochs in the simulator tests.

It also covers blob inclusion verification and data column syncing on a full node in Fulu.

UPDATE: Remove fulu fork from sim tests due to #7199 (comment)

@jimmygchen jimmygchen added the test improvement Improve tests label Mar 25, 2025
@jimmygchen
Copy link
Member Author

On my laptop, a fulu block takes up to 5s to import, and currently we run with 2s slot time (speed-up-factor=2). This should get better once we offload computation from the beacon node, but we may need to increase the slot time or reduce the number of blobs / node for this test to work.

@jimmygchen
Copy link
Member Author

I have tried slowing down the tests, and getting better results but still not perfect:

2025-03-25T11:39:48.3641052Z Mar 25 11:39:48.360 DEBUG Successfully verified gossip data column sidecar  slot: 131, block_root: 0xc196783f41eb37f8b4ef4cc4220f6f3776c0a73ef5e98d6d5d86e01b8ccec957, index: 113 
2025-03-25T11:39:48.3642654Z thread 'main' panicked at testing/simulator/src/basic_sim.rs:387:41:
2025-03-25T11:39:48.3643604Z called `Result::unwrap()` on an `Err` value: "There wasn't a block produced at every slot, got: 112, expected: 129"

I think we'll have to wait until #7117 is merged (offload proof computation from BN).

@jimmygchen jimmygchen added the ready-for-review The code is ready for review label Apr 9, 2025
@jimmygchen
Copy link
Member Author

@dapplion this PR should be unblocked now with #7117 merged, please review when you have time - this will give us some more e2e test coverage on PeerDAS syncing.

@jimmygchen jimmygchen requested a review from dapplion April 9, 2025 15:30
@jimmygchen jimmygchen added das Data Availability Sampling electra Required for the Electra/Prague fork labels Apr 9, 2025
Squashed commit of the following:

commit 008ad45
Author: Jimmy Chen <[email protected]>
Date:   Thu Apr 10 14:18:16 2025 +1000

    Rewrite to avoid lock contention during computation.

commit 5fa537b
Author: Jimmy Chen <[email protected]>
Date:   Thu Apr 10 13:38:22 2025 +1000

    Simplify beacon proposer cache API.

commit 1106419
Author: Jimmy Chen <[email protected]>
Date:   Thu Apr 10 12:55:10 2025 +1000

    Compute proposer shuffling only once in gossip verification.
@jimmygchen jimmygchen force-pushed the add-electra-fulu-sim-test-support branch from 44ae671 to 03d4866 Compare April 11, 2025 07:18
@jimmygchen
Copy link
Member Author

After merging #7117, this seems to be doing a lot better, however it's still missing a few slots.
I've merged in #7304 to see if it improves things (cc @pawanjay176 )

@jimmygchen
Copy link
Member Author

Looks like we're really close after merging the above mentioned 2 PRs, only missing 1 slot now.

thread 'main' panicked at testing/simulator/src/basic_sim.rs:388:41:
called `Result::unwrap()` on an `Err` value: "There wasn't a block produced at every slot, got: 128, expected: 129, missed: [119]"

@jimmygchen
Copy link
Member Author

Note: we don't have getBlobsV2 in the EL version being used, which may improve things but i think it would be good to make sure this works without it.

@jimmygchen jimmygchen force-pushed the add-electra-fulu-sim-test-support branch from 03d4866 to 7a0a25a Compare April 11, 2025 16:07
Copy link
Collaborator

@dapplion dapplion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jimmygchen
Copy link
Member Author

So I pushed one more change to reduce the likelihood of miss blocks:

  • changed number of supernodes from 6 -> 2 (1 BN and 1 proposer only node)
  • change speed-up-factor from 3 -> 2, this means changing slot time from 2s -> 3s.

If this still doesn't work, then we'd need to limit the number of blobs we produce. The current set up with 12 blobs, 6 supernodes and 2s slots isn't feasible on a CI runner.

…sim-test-support

# Conflicts:
#	beacon_node/beacon_chain/src/data_column_verification.rs
@jimmygchen jimmygchen force-pushed the add-electra-fulu-sim-test-support branch from 86ffe8b to cc87b22 Compare May 1, 2025 05:43
@jimmygchen jimmygchen force-pushed the add-electra-fulu-sim-test-support branch from cc87b22 to 87a2c3a Compare May 1, 2025 05:54
@jimmygchen jimmygchen force-pushed the add-electra-fulu-sim-test-support branch from 7e0d76c to dc7b310 Compare May 1, 2025 07:07
@jimmygchen
Copy link
Member Author

I've noticed that only after Fulu fork we get delayed head block:

May 05 16:49:25.086 DEBUG Delayed head block                            block_root: 0x250eb95df22daa9be51b6f77ba52ccba9e95e7d7b499be2c2d9e224fd5c98a10, proposer_index: 21, slot: 9, total_delay_ms: 1086, observed_delay_ms: "175", blob_delay_ms: "unknown", consensus_time_ms: "4", execution_time_ms: "0", available_delay_ms: "180", attestable_delay_ms: "1074", imported_time_ms: "903", set_as_head_time_ms: "1" 

and notice the differences for verify_kzg between different nodes (2ms - 23ms for the exact same column):

May 05 16:49:24.534 DEBUG Data column validation timing                 total: 2, slot: 9, index: 121, timing: TimingInfo { verify_sidecar: 42ns, verify_subnet: 41ns, verify_future: 125ns, verify_finalized: 334ns, verify_first: 41ns, verify_inclusion: 2.792µs, verify_parent: 791ns, verify_slot: 42ns, verify_proposer: 726.083µs, verify_kzg: 2.20475ms, observe_slashable: 1.25µs, observe_gossip: 209ns } 
May 05 16:49:24.991 DEBUG Data column validation timing                 total: 12, slot: 9, index: 121, timing: TimingInfo { verify_sidecar: 41ns, verify_subnet: 0ns, verify_future: 333ns, verify_finalized: 458ns, verify_first: 416ns, verify_inclusion: 7.375µs, verify_parent: 1.791µs, verify_slot: 42ns, verify_proposer: 3.344792ms, verify_kzg: 8.668625ms, observe_slashable: 1.542µs, observe_gossip: 208ns } 
May 05 16:49:25.049 DEBUG Data column validation timing                 total: 26, slot: 9, index: 121, timing: TimingInfo { verify_sidecar: 42ns, verify_subnet: 125ns, verify_future: 292ns, verify_finalized: 542ns, verify_first: 416ns, verify_inclusion: 8µs, verify_parent: 1.833µs, verify_slot: 125ns, verify_proposer: 2.367833ms, verify_kzg: 23.76425ms, observe_slashable: 1.416µs, observe_gossip: 250ns } 

I wanted to be sure we're not slow down because of locks - but this is likely an issue on local setup with limited resources

  • seeing the differences in verify kzg time. The worst case for verifying KZG in the sim tests is about ~100ms for a single column with up to 9 blobs, this is really high and should ideally be under 5ms for 6s slots.

It does perform better under kurtosis though - I don't see the same issue using a similar setup under kurtosis - 2 supernodes 2 fullnodes, 6s slot, and no missed slots for many epochs. I'm guessing the kurtosis setup potentially:

  1. allocates resources more evenly; OR
  2. has getBlobsV2 with a real EL, which allows skipping kzg verification in the CL.

I think this test would work if we set slot time to > 6 seconds, however it would take a lot longer - 6s slot takes ~30min, and it doesn't feel like it's worth it. We should consider writing a Kurtosis test for this.

For now it probably make sense to get the Electra fork and the bug fixes in. I'll remove the Fulu changes from the sim tests and we can move on from this .

@jimmygchen jimmygchen changed the title Add Electra and Fulu forks to basic sim tests Add Electra ~~and Fulu~~ forks to basic sim tests May 5, 2025
@jimmygchen jimmygchen changed the title Add Electra ~~and Fulu~~ forks to basic sim tests Add Electra forks to basic sim tests May 5, 2025
@jimmygchen jimmygchen removed the das Data Availability Sampling label May 5, 2025
@jimmygchen jimmygchen force-pushed the add-electra-fulu-sim-test-support branch from 92b9153 to d485a77 Compare May 5, 2025 20:48
@jimmygchen jimmygchen force-pushed the add-electra-fulu-sim-test-support branch from d485a77 to 75b4b52 Compare May 5, 2025 20:49
@jimmygchen jimmygchen requested a review from pawanjay176 May 5, 2025 23:25
@@ -589,19 +588,19 @@ fn verify_proposer_and_signature<T: BeaconChainTypes>(
chain: &BeaconChain<T>,
) -> Result<(), GossipDataColumnError> {
let column_slot = data_column.slot();
let column_epoch = column_slot.epoch(E::slots_per_epoch());
let slots_per_epoch = T::EthSpec::slots_per_epoch();
let column_epoch = column_slot.epoch(slots_per_epoch);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug in minimal preset - causing proposer shuffling miss for every gossip column.

@jimmygchen jimmygchen added the das Data Availability Sampling label May 8, 2025
@jimmygchen jimmygchen added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels May 8, 2025
mergify bot added a commit that referenced this pull request May 8, 2025
@mergify mergify bot merged commit 4b9c16f into sigp:unstable May 8, 2025
31 checks passed
mergify bot pushed a commit that referenced this pull request May 15, 2025
This PR relates to:
- #7199
- -> workspace_filter has been enabled (dependency logging has been disabled)
- #7394
- -> file logging has been optionally enabled

Building on these, this PR enables dependency logging for the simulators. The logs are written to separate files.

The libp2p/discv5 logs:
- are saved to the directory  specified with `--log-dir`
- respects the `RUST_LOG` environment variable for log level configuration
mergify bot pushed a commit that referenced this pull request Jun 6, 2025
Our basic sim test has been [flaky](https://github.com/sigp/lighthouse/actions/runs/15458818777/job/43515966229) for some time, and seems like it has gotten worse since electra fork was added to it in #7199.

It looks like the github runner is struggling with the load, currently it runs 7 nodes on a 4 CPU runner, which is definitely too much. We could consider moving this to run on our self hosted runner - but I think running 7 nodes is unnecessary and we can probably trim test this down.


  Reduce number of basic sim test nodes from 7 (3 BN + 3 Proposer BN + 1 extra)  to 4 (2 BN + 1 Proposer BN + 1 extra).

If we want to run more nodes, we'd have to consider running on self hosted runners.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
das Data Availability Sampling electra Required for the Electra/Prague fork ready-for-merge This PR is ready to merge. test improvement Improve tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants