-
Notifications
You must be signed in to change notification settings - Fork 896
Description
Since v7.0.0 we're seeing an increased number of state cache misses.
Looking at logs and backtraces from this branch (#7354) they can be categorised as:
- State loads of non-canonical tip states during pruning. Already removed by Drop head tracker for summaries DAG #6744 which has been merged to
unstable
. - State loads of canonical states on archive nodes only. These states are loaded during the migration here:
lighthouse/beacon_node/store/src/hot_cold_store.rs
Lines 3227 to 3231 in 54f7bc5
// This is some state that we want to migrate to the freezer db. // There is no reason to cache this state. let state: BeaconState<E> = store .get_hot_state(&state_root, false)? .ok_or(HotColdDBError::MissingStateToFreeze(state_root))?; - State loads for blob verification of stale blobs
- State loads for attestation verification of low-quality attestations (more likely on nodes with
--subscribe-all-subnets
).
I've already tried a few minor tweaks but none have been particularly effective. The one that works best is setting the cache size back to 128.
Things that don't work:
- Cache size 64: still results in frequent misses in all of the above cases.
- Fixing pruning to be LRU-with-favouritism rather than MRU-with-favouritism: 45a6f19.
- Both of the above together.
I think a v7.0.1 with the LRU fix and the default set back to 128 is probably the best option, as these cache misses do create unnecessary work, especially around epoch boundaries, which has an impact on node perf (so we can't just ignore them or downgrade the log level).
The downside to restoring the default of 128 is that it makes non-finality harder. However, given what we've learned, we now know that lowering the cache size during non-finality is effective, and can recommend this on our comms if the need arises.
Longer term we have plans for size-based pruning and intra-rebasing (#7062 -- needs a port to unstable
once the Milhouse dep bump is merged) which should make the non-finality case safer again.