[precompile] sync master and rollback partial of mainflow change #963

hero78119 · 2025-06-04T12:36:41Z

Change

This PR sync with ceno master, and rollback partial of change to assure not affect ceno mainflow benchmark

benchmark against master

Benchmark	Median Time (s)	Median Change (%)
fibonacci_max_steps_1048576	2.1283	+2.0905% (Change within noise)
fibonacci_max_steps_2097152	3.6231	+0.9229% (No change in performance)
fibonacci_max_steps_4194304	6.4747	-0.1104% (No change in performance)

benchmark shows there are quite of time spending on glibc free (drop) when object end of its scopes. Follow openvm using [jemalloc](https://github.com/openvm-org/openvm/blob/c771a213f5e7f0732e0ddbafb273e15d99c5049d/crates/vm/Cargo.toml#L56) as global allocators. and set jemalloc parameter follows https://github.com/openvm-org/openvm/blob/c771a213f5e7f0732e0ddbafb273e15d99c5049d/.github/workflows/benchmark-call.yml#L218 > I do not use jemalloc "background_thread: true" as I thought thread in background might occupied other schedule which affect cpu intensive program ### change scope - enable jemalloc by default when compiling ceno_cli - support `cargo make cli` to install ceno_cli - introduce "jemalloc" features ### benchmark benchmark on AMD EPYC 32 cores with command `JEMALLOC_SYS_WITH_MALLOC_CONF="retain:true,metadata_thp:always,thp:always,dirty_decay_ms:-1,muzzy_decay_ms:-1,abort_conf:true" cargo bench --bench fibonacci --features jemalloc --package ceno_zkvm -- --baseline opt-baseline` | Benchmark | Average Time | Improvement | Throughput (instructions/sec) | |-----------------|--------------|-------------|---------------------------| | fibonacci 2^20 | 2.0020 s | -14.74% | 523.76k | | fibonacci 2^21 | 3.5903 s | -18.89% | 584.34k | | fibonacci 2^22 | 6.6531 s | -24.69% | 630.28k | --------- Co-authored-by: Zhang Zhuo <[email protected]>

## Motivation We want to unify the prover's workflow for opcode circuits and table circuits. As they follow the same kind of workflow, i.e. 1. infer tower witness; 2. run tower prover; 3. run main sumcheck which is optional for table circuits. Before this pr, the **opcode** circuit includes multiple read/write/logup records in a **single** tower while **table** circuit packs read/write/logup records into one dedicated tower for each read/write/logup expression. We found that the way that's used by table circuit to build tower tree is better than that of opcode. ## Performance | benchmark | proof size (MB) | proving time | |------------|-----------------|-------------| | fibonacci 2^20 | 1.14 -> 1.2 (5%) | -0.8% | | fibonacci 2^21 | 1.22 -> 1.28 (5%) | -5% | | fibonacci 2^22 | 1.3 -> 1.37 (5%) | -10%| **New issue**: The proof size increase is due to we have more `ProdSpec` and `LogupSpec` which implies more points and evaluations in the `struct TowerProof`. Note that after we abandon the old "interleaving" method, the number of rounds per product spec and lougup spec are same now, therefore, we can remove this new overhead in follow up pr. ## Impact Blocker for scroll-tech#923. --------- Co-authored-by: sm.wu <[email protected]>

motivated by scroll-tech#947

To serve for various purpose, e.g. benchmark

…oll-tech#954) ### Change Scope - [x] example run failed in e2e https://github.com/scroll-tech/ceno/blob/ef93198c83e3b4fcd7f9949ebbc07bc9c93e4de9/examples/examples/hashing.rs#L16 In e2e we only support hints as u32 item and write one by one. But some example requires it as whole vector. Thus, guest program will always failed since unable to serve hint properly. - [x] move most of verbose message from `info` to `trace/debug` so the default e2e be more clean - [x] more comments and polish readme --------- Co-authored-by: Akase Haruka <[email protected]>

…-tech#956) Extracted from scroll-tech#952. Observe a bottleneck on previous interpolation which contribute to most of time due to `vector.extend` operation and bunch of allocations. This PR rewrite univariate extrapolation 1. as the point to be interpolate are fixed set, we can pre-compute all stuff require field inverse 2. in-place change to avoid allocation ### benchmark In Ceno opcode main sumcheck part we batch different degree > 1 into one batch so this function will be used. It shows a slightly improvement (~3%) on Fibonacci 2^24 e2e | Benchmark | Median Time (s) | Median Change (%) | |----------------------------------|------------------|--------------------| | fibonacci_max_steps_1048576 | 2.3978 | +0.9805% (No significant change ) | | fibonacci_max_steps_2097152 | 4.2579 | +1.7587% (Change within noise) | | fibonacci_max_steps_4194304 | 7.7561 | -3.5338% |

build on top of scroll-tech#956 to address review comments clean up point from sumcheck proof, as verifier should derived itself

refactor univariate interpolation in barycentric and unroll version. cross refer to issue scroll-tech/ceno-recursion-verifier#6

…-tech#956) Extracted from scroll-tech#952. Observe a bottleneck on previous interpolation which contribute to most of time due to `vector.extend` operation and bunch of allocations. This PR rewrite univariate extrapolation 1. as the point to be interpolate are fixed set, we can pre-compute all stuff require field inverse 2. in-place change to avoid allocation In Ceno opcode main sumcheck part we batch different degree > 1 into one batch so this function will be used. It shows a slightly improvement (~3%) on Fibonacci 2^24 e2e | Benchmark | Median Time (s) | Median Change (%) | |----------------------------------|------------------|--------------------| | fibonacci_max_steps_1048576 | 2.3978 | +0.9805% (No significant change ) | | fibonacci_max_steps_2097152 | 4.2579 | +1.7587% (Change within noise) | | fibonacci_max_steps_4194304 | 7.7561 | -3.5338% |

benchmark shows there are quite of time spending on glibc free (drop) when object end of its scopes. Follow openvm using [jemalloc](https://github.com/openvm-org/openvm/blob/c771a213f5e7f0732e0ddbafb273e15d99c5049d/crates/vm/Cargo.toml#L56) as global allocators. and set jemalloc parameter follows https://github.com/openvm-org/openvm/blob/c771a213f5e7f0732e0ddbafb273e15d99c5049d/.github/workflows/benchmark-call.yml#L218 > I do not use jemalloc "background_thread: true" as I thought thread in background might occupied other schedule which affect cpu intensive program ### change scope - enable jemalloc by default when compiling ceno_cli - support `cargo make cli` to install ceno_cli - introduce "jemalloc" features ### benchmark benchmark on AMD EPYC 32 cores with command `JEMALLOC_SYS_WITH_MALLOC_CONF="retain:true,metadata_thp:always,thp:always,dirty_decay_ms:-1,muzzy_decay_ms:-1,abort_conf:true" cargo bench --bench fibonacci --features jemalloc --package ceno_zkvm -- --baseline opt-baseline` | Benchmark | Average Time | Improvement | Throughput (instructions/sec) | |-----------------|--------------|-------------|---------------------------| | fibonacci 2^20 | 2.0020 s | -14.74% | 523.76k | | fibonacci 2^21 | 3.5903 s | -18.89% | 584.34k | | fibonacci 2^22 | 6.6531 s | -24.69% | 630.28k | --------- Co-authored-by: Zhang Zhuo <[email protected]>

…into feat/merge_to_master

spherel

LGTM!

hero78119 and others added 30 commits May 19, 2025 14:43

refactor layer proof done

f39a311

finish layer logic

123cca3

gkr layer fixed

2d4224f

wip

1b196e0

testing on keccak-f

2b69809

cleanup

4ec140a

witness compile pass

a0b4889

wip eq

14315ab

bitwise keccak benchmark

fc3b590

modify benchmark

58b74de

add more tracing and refactor

7920210

refactor with eq benchmark (scroll-tech#948)

f45be60

motivated by scroll-tech#947

add keccak no syscall version (scroll-tech#955)

82b5b04

To serve for various purpose, e.g. benchmark

chores: cleanup

b78f9c5

merge with upstream

cfb0aff

wip for lookup version

3cd93ea

wip

ccda0a8

clean up point from sumcheck proof (scroll-tech#959)

38cad1c

build on top of scroll-tech#956 to address review comments clean up point from sumcheck proof, as verifier should derived itself

loop unroll univariate extrapolate (scroll-tech#957)

38f43ff

refactor univariate interpolation in barycentric and unroll version. cross refer to issue scroll-tech/ceno-recursion-verifier#6

gkr witness assignment pass

27eda78

compile pass

1640fd4

benchmark ready

3759498

jemalloc in gkr-iop

8e0d7f3

chores: bench

9ce5ad9

hero78119 added 26 commits June 1, 2025 22:38

rotation workload

36cb8ce

opt out eval

3fac728

chore: fix lint

e4e44c8

cleanup

103c676

chore and opt memory

33cb254

rotation argument proving flow fix

cc00316

refactor zerocheck prover and remove hardcode rotation params

79ce702

construct output expression follow the order of allocation

aa35d23

revert ceno prover change

4d0cf67

temporarily mask keccak precompile in proving flow

35d26c3

clippy fix

b1417d0

clippy fix

a0c4a2a

fix typo and add zero_eval as new type

14299e3

chores: fix typo

536102e

merge with feature branch

939293f

fix zero eval PointAndEval size

cc87e6a

Merge branch 'ming/refactor-prover' into feat/refactor-prove

004f2d1

bench assert proof exist

9b2b2a2

merge with feature branch

7782269

clippy and tests pass

b3204c0

resolve merge issue

e148472

fix some merge problem

e775506

code cosmetics

a35b97b

rollback max column to 2^16

bb5a0ff

cleanup comments

a2d7362

Merge branch 'tianyi/refactor-prover' of github.com:scroll-tech/ceno …

400181a

…into feat/merge_to_master

hero78119 requested a review from spherel June 4, 2025 12:36

fix fmt

af4d9d5

spherel approved these changes Jun 6, 2025

View reviewed changes

hero78119 merged commit 7066fa8 into scroll-tech:tianyi/refactor-prover Jun 6, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[precompile] sync master and rollback partial of mainflow change #963

[precompile] sync master and rollback partial of mainflow change #963

Uh oh!

hero78119 commented Jun 4, 2025 •

edited

Loading

Uh oh!

spherel left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[precompile] sync master and rollback partial of mainflow change #963

[precompile] sync master and rollback partial of mainflow change #963

Uh oh!

Conversation

hero78119 commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change

benchmark against master

Uh oh!

spherel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hero78119 commented Jun 4, 2025 •

edited

Loading