Skip to content

Conversation

@LouisTsai-Csie
Copy link
Collaborator

@LouisTsai-Csie LouisTsai-Csie commented Nov 4, 2025

🗒️ Description

Fixed Opcode Count Benchmark

This update introduces a fixed opcode count benchmark scenario.
A new flag, --fixed-opcode-count, and a new test marker, gas_ref, have been added. Only tests marked with gas_ref support the fixed opcode count feature.

Example command:

fill -v tests/benchmark/compute/instruction/<test> \
    --fixed-opcode-count 20 \
    --clean -m benchmark

Technical Notes

  1. Regular and gas repricing reference tests should run under the normal benchmark command.
  2. When specifying --fixed-opcode-count, only gas repricing reference tests will be executed.
  3. Currently, this feature is supported only for benchmark tests written using the benchmark test wrapper and code generator.
  4. The current benchmark process is not fully optimized, but it can be refactored later.
  5. When running --fixed-opcode-count command, the gas limit would be configured as 1000M gas limit by default. Manually configure the --gas-benchmark-values will trigger an error.

Benchmark Pattern

  1. Extract the parameters (setup, attack_block) from the benchmark test wrapper and generate a contract that iterates the attack_block 1000 times.
  2. Generate another contract that calls the first contract fixed-opcode-count times.
  3. This ensures the total opcode execution count equals 1000 × fixed-opcode-count.

Example
Setting --fixed-opcode-count 200 means executing the opcode 200 × 1000 = 200,000 times in total.
The first contract runs 1000 iterations per call, while the second contract repeats those calls 200 times.

🔗 Related Issues or PRs

issue #1604

✅ Checklist

  • All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    uvx tox -e static
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Considered adding an entry to CHANGELOG.md.
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).
  • Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
  • Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
  • Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

Cute Animal Picture

Put a link to a cute animal picture inside the parenthesis-->

@jsign
Copy link
Contributor

jsign commented Nov 6, 2025

If the --opcode-count isn't provided, it would do the same behavior as today of filling the block with the asked gas limits? Or those tests will always need an opcode-count flag?

@codecov-commenter
Copy link

codecov-commenter commented Nov 6, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.08%. Comparing base (dedec64) to head (fa69c08).
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff              @@
##           forks/osaka    #1747   +/-   ##
============================================
  Coverage        86.08%   86.08%           
============================================
  Files              743      743           
  Lines            44072    44072           
  Branches          3891     3891           
============================================
  Hits             37938    37938           
  Misses            5656     5656           
  Partials           478      478           
Flag Coverage Δ
unittests 86.08% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@LouisTsai-Csie
Copy link
Collaborator Author

@jsign Yes it would by default use the same behavior as it is now. But if the flag being specified, it would switch to the new fixed opcode count scenario. Do you think this is straightforward, or how could we improve the workflow here? Thanks

@jsign
Copy link
Contributor

jsign commented Nov 6, 2025

@jsign Yes it would by default use the same behavior as it is now. But if the flag being specified, it would switch to the new fixed opcode count scenario. Do you think this is straightforward, or how could we improve the workflow here? Thanks

I think we'll be interested in both styles, one for worst-case block gas limit and the other for regression-like analysis as planned.

I think this optional flag and defaulting to worst-case-gas-limit is quite good, so sgtm!

Copy link
Member

@marioevz marioevz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general looks good to me, but I left a couple of comments I feel we should address.

dest="fixed_opcode_count",
type=str,
default=None,
help="Specify fixed opcode counts for benchmark tests as a comma-separated list.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this value is times one thousand, so we should specify that here.

chetna-mittal pushed a commit to gnosischain/execution-specs that referenced this pull request Nov 8, 2025
Copy link
Contributor

@spencer-tb spencer-tb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Added some comments.

@LouisTsai-Csie
Copy link
Collaborator Author

Thanks for the review @marioevz , @spencer-tb . I update according to the comment, and i am ready for second review

I receive some feedback from Kamil. For gas repricing, we only want to run a specific test (only one parameter combination is enough). So i add extract filter logic in the repricing marker, such that you could select the benchmark parameter comnbination.

Example:

@pytest.mark.repricing(
    size=1024 * 1024,
    non_zero_data=True,
    zeros_topic=False,
    fixed_offset=True,
)
@pytest.mark.parametrize(
    "size,non_zero_data",
)
@pytest.mark.parametrize(
    "zeros_topic",
)
@pytest.mark.parametrize("fixed_offset")
def test_log(...)
...

This is flexible, for normal scenario we could simply label the marker without configuring parameter:

@pytest.mark.repricing
def test_codesize(
    benchmark_test: BenchmarkTestFiller,
) -> None:
    """Benchmark CODESIZE instruction."""
    benchmark_test(
        code_generator=ExtCallGenerator(attack_block=Op.CODESIZE),
    )

@jochem-brouwer
Copy link
Member

I will check the existing benchmark tests to see if we can convert the existing tests to use this "opcode count" metric in it. This would avoid adding this CLI option and would instead add this option to the tests for maintainability of the framework (vs. maintainability of the tests).

Copy link
Member

@marioevz marioevz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the changes!

Branch needs a rebase and then we can merge.

if self.fixed_opcode_count is not None:
max_iterations = min(max_iterations, 1000)

print(f"max_iterations: {max_iterations}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This print could be a bit annoying, we should remove IMO.

@LouisTsai-Csie
Copy link
Collaborator Author

Hi @jochem-brouwer , currently i reuse the code_generator feature in the benchmark test wrapper. Since this type of test structure as:

  • setup: initial stack element
  • attack_block: repeat certain opcode sequence as benchmark target. This is hardcoded as 1000 now.
  • cleanup: We skip this now since we do not need to clean up the stack as there is no more iteration in the current call frame.

A low hanging fruit would be convert the existing tests in this format. But also we could find other ways for this feature.

@spencer-tb
Copy link
Contributor

I don't want to block this as LGTM from myside.
One comment would be to add some simple CI for the --fixed-opcode-count flag?
Inspired by: #1779

@LouisTsai-Csie LouisTsai-Csie merged commit 414f27b into ethereum:forks/osaka Nov 13, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants