feat(benchmark): support `--fixed-opcode-count` flag and tests #1747

LouisTsai-Csie · 2025-11-04T13:51:07Z

🗒️ Description

Fixed Opcode Count Benchmark

This update introduces a fixed opcode count benchmark scenario.
A new flag, --fixed-opcode-count, and a new test marker, gas_ref, have been added. Only tests marked with gas_ref support the fixed opcode count feature.

Example command:

fill -v tests/benchmark/compute/instruction/<test> \
    --fixed-opcode-count 20 \
    --clean -m benchmark

Technical Notes

Regular and gas repricing reference tests should run under the normal benchmark command.
When specifying --fixed-opcode-count, only gas repricing reference tests will be executed.
Currently, this feature is supported only for benchmark tests written using the benchmark test wrapper and code generator.
The current benchmark process is not fully optimized, but it can be refactored later.
When running --fixed-opcode-count command, the gas limit would be configured as 1000M gas limit by default. Manually configure the --gas-benchmark-values will trigger an error.

Benchmark Pattern

Extract the parameters (setup, attack_block) from the benchmark test wrapper and generate a contract that iterates the attack_block 1000 times.
Generate another contract that calls the first contract fixed-opcode-count times.
This ensures the total opcode execution count equals 1000 × fixed-opcode-count.

Example
Setting --fixed-opcode-count 200 means executing the opcode 200 × 1000 = 200,000 times in total.
The first contract runs 1000 iterations per call, while the second contract repeats those calls 200 times.

🔗 Related Issues or PRs

issue #1604

✅ Checklist

All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
```
uvx tox -e static
```
All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
All: Considered adding an entry to CHANGELOG.md.
All: Considered updating the online docs in the ./docs/ directory.
All: Set appropriate labels for the changes (only maintainers can apply labels).
Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

Cute Animal Picture

jsign · 2025-11-06T15:27:05Z

If the --opcode-count isn't provided, it would do the same behavior as today of filling the block with the asked gas limits? Or those tests will always need an opcode-count flag?

codecov-commenter · 2025-11-06T15:29:45Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.08%. Comparing base (dedec64) to head (fa69c08).
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@             Coverage Diff              @@
##           forks/osaka    #1747   +/-   ##
============================================
  Coverage        86.08%   86.08%           
============================================
  Files              743      743           
  Lines            44072    44072           
  Branches          3891     3891           
============================================
  Hits             37938    37938           
  Misses            5656     5656           
  Partials           478      478

Flag	Coverage Δ
unittests	`86.08% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

LouisTsai-Csie · 2025-11-06T15:34:32Z

@jsign Yes it would by default use the same behavior as it is now. But if the flag being specified, it would switch to the new fixed opcode count scenario. Do you think this is straightforward, or how could we improve the workflow here? Thanks

jsign · 2025-11-06T16:12:47Z

@jsign Yes it would by default use the same behavior as it is now. But if the flag being specified, it would switch to the new fixed opcode count scenario. Do you think this is straightforward, or how could we improve the workflow here? Thanks

I think we'll be interested in both styles, one for worst-case block gas limit and the other for regression-like analysis as planned.

I think this optional flag and defaulting to worst-case-gas-limit is quite good, so sgtm!

marioevz

In general looks good to me, but I left a couple of comments I feel we should address.

packages/testing/src/execution_testing/cli/pytest_commands/plugins/shared/benchmarking.py

marioevz · 2025-11-07T22:35:00Z

packages/testing/src/execution_testing/cli/pytest_commands/plugins/shared/benchmarking.py

+        dest="fixed_opcode_count",
+        type=str,
+        default=None,
+        help="Specify fixed opcode counts for benchmark tests as a comma-separated list.",


IIUC, this value is times one thousand, so we should specify that here.

tests/benchmark/compute/instruction/test_account_query.py

ethereum#1747)

spencer-tb

Thanks. Added some comments.

packages/testing/src/execution_testing/cli/pytest_commands/plugins/shared/benchmarking.py

tests/benchmark/compute/instruction/test_account_query.py

packages/testing/src/execution_testing/cli/pytest_commands/plugins/shared/benchmarking.py

packages/testing/src/execution_testing/specs/benchmark.py

LouisTsai-Csie · 2025-11-11T03:09:27Z

Thanks for the review @marioevz , @spencer-tb . I update according to the comment, and i am ready for second review

I receive some feedback from Kamil. For gas repricing, we only want to run a specific test (only one parameter combination is enough). So i add extract filter logic in the repricing marker, such that you could select the benchmark parameter comnbination.

Example:

@pytest.mark.repricing(
    size=1024 * 1024,
    non_zero_data=True,
    zeros_topic=False,
    fixed_offset=True,
)
@pytest.mark.parametrize(
    "size,non_zero_data",
)
@pytest.mark.parametrize(
    "zeros_topic",
)
@pytest.mark.parametrize("fixed_offset")
def test_log(...)
...

This is flexible, for normal scenario we could simply label the marker without configuring parameter:

@pytest.mark.repricing
def test_codesize(
    benchmark_test: BenchmarkTestFiller,
) -> None:
    """Benchmark CODESIZE instruction."""
    benchmark_test(
        code_generator=ExtCallGenerator(attack_block=Op.CODESIZE),
    )

jochem-brouwer · 2025-11-11T16:21:58Z

I will check the existing benchmark tests to see if we can convert the existing tests to use this "opcode count" metric in it. This would avoid adding this CLI option and would instead add this option to the tests for maintainability of the framework (vs. maintainability of the tests).

marioevz

LGTM, thanks for the changes!

Branch needs a rebase and then we can merge.

marioevz · 2025-11-11T21:48:23Z

packages/testing/src/execution_testing/specs/benchmark.py

+        if self.fixed_opcode_count is not None:
+            max_iterations = min(max_iterations, 1000)
+
+        print(f"max_iterations: {max_iterations}")


This print could be a bit annoying, we should remove IMO.

LouisTsai-Csie · 2025-11-12T07:26:46Z

Hi @jochem-brouwer , currently i reuse the code_generator feature in the benchmark test wrapper. Since this type of test structure as:

setup: initial stack element
attack_block: repeat certain opcode sequence as benchmark target. This is hardcoded as 1000 now.
cleanup: We skip this now since we do not need to clean up the stack as there is no more iteration in the current call frame.

A low hanging fruit would be convert the existing tests in this format. But also we could find other ways for this feature.

spencer-tb · 2025-11-12T16:11:38Z

I don't want to block this as LGTM from myside.
One comment would be to add some simple CI for the --fixed-opcode-count flag?
Inspired by: #1779

LouisTsai-Csie self-assigned this Nov 4, 2025

parithosh mentioned this pull request Nov 4, 2025

Collect First-Level Benchmark Data for Repricing Initiative ethpandaops/gas-lighting-tracker#6

Open

LouisTsai-Csie force-pushed the feat/fixed-op-count branch from 9097d5f to f46820c Compare November 6, 2025 09:49

LouisTsai-Csie marked this pull request as ready for review November 6, 2025 14:51

spencer-tb added the P-high label Nov 6, 2025

marioevz reviewed Nov 7, 2025

View reviewed changes

chetna-mittal pushed a commit to gnosischain/execution-specs that referenced this pull request Nov 8, 2025

feat(docs): auto-generate a page showing fill's command-line options (

19d3563

ethereum#1747)

spencer-tb reviewed Nov 10, 2025

View reviewed changes

LouisTsai-Csie force-pushed the feat/fixed-op-count branch from f46820c to 92793f3 Compare November 11, 2025 03:00

marioevz approved these changes Nov 12, 2025

View reviewed changes

LouisTsai-Csie added 8 commits November 12, 2025 15:17

feat: implement fixed opcode count benchmark

34aa6f1

feat: implement reference test marker for repricing

4adf837

feat: support execute mode for repricing test

9345f2c

feat: add gas repricing test marker

5e49846

refactor: benchmark marker conflict logic

a52daa2

chore: rename reference test marker to repricing

94f3c7e

feat: filter reference test with specified parameter

acbb889

chore: fix linting

fa69c08

LouisTsai-Csie force-pushed the feat/fixed-op-count branch from 92793f3 to fa69c08 Compare November 12, 2025 07:22

LouisTsai-Csie requested a review from marioevz November 12, 2025 07:27

LouisTsai-Csie merged commit 414f27b into ethereum:forks/osaka Nov 13, 2025
12 checks passed

spencer-tb mentioned this pull request Nov 14, 2025

enhance(test-benchmark): use config file for fixed opcode count scenarios #1790

Draft

5 tasks

feat(benchmark): support --fixed-opcode-count flag and tests #1747

feat(benchmark): support --fixed-opcode-count flag and tests #1747

Uh oh!

Conversation

LouisTsai-Csie commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🗒️ Description

Fixed Opcode Count Benchmark

🔗 Related Issues or PRs

✅ Checklist

Cute Animal Picture

Uh oh!

jsign commented Nov 6, 2025

Uh oh!

codecov-commenter commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

LouisTsai-Csie commented Nov 6, 2025

Uh oh!

jsign commented Nov 6, 2025

Uh oh!

marioevz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

marioevz Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

spencer-tb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LouisTsai-Csie commented Nov 11, 2025

Uh oh!

jochem-brouwer commented Nov 11, 2025

Uh oh!

marioevz left a comment

Choose a reason for hiding this comment

Uh oh!

marioevz Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

LouisTsai-Csie commented Nov 12, 2025

Uh oh!

spencer-tb commented Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

feat(benchmark): support `--fixed-opcode-count` flag and tests #1747

feat(benchmark): support `--fixed-opcode-count` flag and tests #1747

LouisTsai-Csie commented Nov 4, 2025 •

edited

Loading

codecov-commenter commented Nov 6, 2025 •

edited

Loading