Skip to content

Conversation

@rjl493456442
Copy link
Member

@rjl493456442 rjl493456442 commented Apr 4, 2025

This pull request enhances the block prefetcher by executing transactions in parallel
to warm the cache alongside the main block processor.

Unlike the original prefetcher, which only executes the next block and is limited to chain
syncing, the new implementation can be applied to any block. This makes it useful not
only during chain sync but also for regular block insertion after the initial sync.

TODO

  • experiment whether the state hashing is necessary in block prefetcher (duplicate with trie prefetcher)

@rjl493456442 rjl493456442 requested a review from holiman as a code owner April 4, 2025 05:21
@rjl493456442
Copy link
Member Author

PR: bench05
Master: bench06

  • PR is about 10% faster than Master
  • The speedup comes from the faster account/storage read
  • The memory allocation and CPU usage is about 2x than master
截屏2025-04-04 13 22 31 截屏2025-04-04 13 25 25 截屏2025-04-04 13 25 45 截屏2025-04-04 13 28 30 截屏2025-04-04 13 29 13 截屏2025-04-04 13 29 51
File: geth
Type: inuse_space
Time: 2025-03-31 19:57:06 CST
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) alloc_space
(pprof) top
Showing nodes accounting for 3139.56GB, 49.79% of 6305.50GB total
Dropped 2237 nodes (cum <= 31.53GB)
Showing top 10 nodes out of 245
      flat  flat%   sum%        cum   cum%
  886.65GB 14.06% 14.06%  1019.34GB 16.17%  github.com/ethereum/go-ethereum/trie.(*hasher).hashFullNodeChildren
  472.06GB  7.49% 21.55%   935.83GB 14.84%  github.com/ethereum/go-ethereum/trie.decodeFull
  463.89GB  7.36% 28.90%   463.89GB  7.36%  github.com/ethereum/go-ethereum/trie.decodeRef
     320GB  5.07% 33.98%      320GB  5.07%  github.com/ethereum/go-ethereum/core/vm.(*Memory).Resize
  298.84GB  4.74% 38.72%   298.84GB  4.74%  github.com/ethereum/go-ethereum/rlp.(*encBuffer).makeBytes
  227.90GB  3.61% 42.33%   227.90GB  3.61%  github.com/ethereum/go-ethereum/trie.(*tracer).onRead
  147.57GB  2.34% 44.67%   147.57GB  2.34%  github.com/ethereum/go-ethereum/common.RightPadBytes
  109.50GB  1.74% 46.41%   110.78GB  1.76%  github.com/ethereum/go-ethereum/rlp.(*encBuffer).writeBytes
  106.61GB  1.69% 48.10%   106.61GB  1.69%  golang.org/x/crypto/sha3.NewLegacyKeccak256
  106.55GB  1.69% 49.79%   106.55GB  1.69%  github.com/ethereum/go-ethereum/core/vm.codeBitmap

The main memory allocator is trie loader and trie hasher

@rjl493456442 rjl493456442 force-pushed the in-block-cachewarmmer branch 2 times, most recently from f4f1f5a to ce318a3 Compare April 6, 2025 11:55
@rjl493456442 rjl493456442 force-pushed the in-block-cachewarmmer branch 2 times, most recently from ebec558 to 5abc763 Compare April 28, 2025 06:57
@rjl493456442 rjl493456442 force-pushed the in-block-cachewarmmer branch 2 times, most recently from 8ef7604 to 271503f Compare May 5, 2025 02:41
@rjl493456442
Copy link
Member Author

@MariusVanDerWijden @fjl Please take a look. This PR is ready for reviewing.

@rjl493456442 rjl493456442 added this to the 1.15.12 milestone May 5, 2025
return nil
}
// Preload the touched accounts and storage slots in advance
sender, err := types.Sender(signer, tx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this realistically fail? Only if either the block as at a fork boundary and the signer changes or if the signature was invalid, right? Shouldn't we just exit here? and else always warm the sender

statedb.IntermediateRoot(true)
// Preload the contract code if the destination has non-empty code
if account != nil && !bytes.Equal(account.CodeHash, types.EmptyCodeHash.Bytes()) {
reader.Code(*tx.To(), common.BytesToHash(account.CodeHash))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this faster than blindly loading the code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also follow 7702 delegations here already?

Copy link
Member Author

@rjl493456442 rjl493456442 May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this faster than blindly loading the code?

not sure, but it's cheap anyway?

@rjl493456442 rjl493456442 force-pushed the in-block-cachewarmmer branch from 9260d8e to 8be2f84 Compare May 8, 2025 02:15
// This operation incurs significant memory allocations due to
// trie hashing and node decoding. TODO(rjl493456442): investigate
// ways to mitigate this overhead.
stateCpy.IntermediateRoot(true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're only checking the interrupt at the beginning of the call, which was fine previously where we linearly executed the transactions, but now the interrupt will most likely not stop any work from being done, since all go routines are likely to be past the entry point. I'm wondering whether it would make sense to start a second go routine that does something like this:

go func (evm *EVM, interrupt *atomic.Bool) {
    for {
         time.Sleep(time.Millisecond)
         if interrupt != nil && interrupt.Load() {
              evm.Cancel()
         }
    }

(or something similar, you get the gist)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. We limit the parallelism of workers to runtime.NumCPU() / 2. If the available CPU cores is 16, then only 8 routines will be created and transactions are assigned to these workers linearly.

If the prefetching is terminated, we still have very high chance to stop/prevent the following tx executions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, I missed that. Makes sense

@MariusVanDerWijden
Copy link
Member

Allocations are really a bit crazy :D Going up to 500MB/s. Just added two nitpicks, otherwise this looks good to me.
As discussed on stabby, we should merge and fix up the allocation bit later

Co-authored-by: Marius van der Wijden <[email protected]>
Copy link
Member

@MariusVanDerWijden MariusVanDerWijden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rjl493456442 rjl493456442 merged commit 485ff4b into ethereum:master May 8, 2025
3 of 4 checks passed
howjmay pushed a commit to iotaledger/go-ethereum that referenced this pull request Aug 27, 2025
This pull request enhances the block prefetcher by executing transactions 
in parallel to warm the cache alongside the main block processor.

Unlike the original prefetcher, which only executes the next block and
is limited to chain syncing, the new implementation can be applied to any 
block. This makes it useful not only during chain sync but also for regular 
block insertion after the initial sync.


---------

Co-authored-by: Marius van der Wijden <[email protected]>
gballet pushed a commit to gballet/go-ethereum that referenced this pull request Sep 11, 2025
This pull request enhances the block prefetcher by executing transactions 
in parallel to warm the cache alongside the main block processor.

Unlike the original prefetcher, which only executes the next block and
is limited to chain syncing, the new implementation can be applied to any 
block. This makes it useful not only during chain sync but also for regular 
block insertion after the initial sync.


---------

Co-authored-by: Marius van der Wijden <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants