-
Notifications
You must be signed in to change notification settings - Fork 39
Precompile: Frontend and backend for building circuits #799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dc664af to
929eddf
Compare
Suggestions for #799 Feel free to pick and choose from the suggestions. I talk about most of them on your PR. --------- Co-authored-by: dreamATD <[email protected]>
naure
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass on gkr_iop. It makes sense so far.
16b57f3 to
29061f1
Compare
Suggestions for #799 Feel free to pick and choose from the suggestions. I talk about most of them on your PR. --------- Co-authored-by: dreamATD <[email protected]>
cffdd03 to
d51562b
Compare
hero78119
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome job!
I leave few comments in separate section due to large PR so I did the review in segmented time.
Most of the utility of code reused can be done later, I think the most important point might be trying one pre-compile (e.g. keccak-f) first, and benchmark the preliminary performance. Once it meet the requirements, we proceed to more engineering polishing works :)
7762988 to
87d1a30
Compare
Suggestions for #799 Feel free to pick and choose from the suggestions. I talk about most of them on your PR. --------- Co-authored-by: dreamATD <[email protected]>
87d1a30 to
88f9b00
Compare
|
related to #191 |
Suggestions for #799 Feel free to pick and choose from the suggestions. I talk about most of them on your PR. --------- Co-authored-by: dreamATD <[email protected]>
88f9b00 to
eb4c9cb
Compare
Remove buffers and replace the underlying util functions. Add comments and fix some tiny bugs Suggestions for 'Frontend and backend for building circuits' (#801) Suggestions for #799 Feel free to pick and choose from the suggestions. I talk about most of them on your PR. --------- Co-authored-by: dreamATD <[email protected]> Refine according to comments refine the protocol prover and verifier structs Add more comments Tiny fix according to the latest comments.
eb4c9cb to
ba053c2
Compare
To close issue #632 named io as `debug_println` in guest program debug build, assuming no "println!" use case in guest program. In debug build, we extend stack address a bit to cover a reserved 256k for io. This extra reserved space also reflect in linker script, so the write to this region wont got any complaints from either elf or riscv emulator Besides, this PR also fix a previous problem where meaningful symbol in bss/sbss section will be skip due to their value are 0. We need to reserve and padding to cover them, since those might be some static variables initialized with 0 or uninitialized. Without do it, emulator will also complain regions is not writable. - cleanup previois workaround in guest program for io - extend stack address for io consistency check during debug build - refactor `load_elf` bss/sbss padding issue. - e2e command also shows io result. - respect profile in guest program examples compilation. An guest program with IO ```bash cargo run --release --features sanity-check --package ceno_zkvm --bin e2e -- --platform=ceno --hints=10 --public-io=4191 examples/target/riscv32im-ceno-zkvm-elf/release/examples/ceno_rt_io cargo run --features sanity-check --package ceno_zkvm --bin e2e -- --platform=ceno --hints=10 --public-io=4191 examples/target/riscv32im-ceno-zkvm-elf/debug/examples/ceno_rt_io ```
To close #936 ### Design rationales - introduce `VirtualPolynomialsBuilder` to lift a witness of "ArcPoly" type to expression container, so they can involve into expression domain for calculation - apply `VirtualPolynomialsBuilder` in tower prover. - keep scalar in base field as possible via introducing `Either<Base, Ext>` type - reserve design for "eq" degree -1 optimisation > this part work haven't done yet and set as future work :) `VirtualPolynomialsBuilder` is more like a util function for ceno main sumcheck flow. For GKR layer circuit in gk- iop #799 , the expression system will directly applied on chip-builder and skip `VirtualPolynomialsBuilder` ### benchmark there is no impact for e2e benchmark before/after this change, which is expected 2^20 ``` fibonacci_max_steps_1048576/prove_fibonacci/fibonacci_max_steps_1048576 time: [2.3583 s 2.3709 s 2.3848 s] change: [-1.8405% -1.0740% -0.2480%] (p = 0.03 < 0.05) Change within noise threshold. ``` 2^21 ``` fibonacci_max_steps_2097152/prove_fibonacci/fibonacci_max_steps_2097152 time: [4.4650 s 4.4758 s 4.4867 s] change: [-0.6673% -0.3122% +0.0493%] (p = 0.13 > 0.05) No change in performance detected. ``` 2^22 ``` fibonacci_max_steps_4194304/prove_fibonacci/fibonacci_max_steps_4194304 time: [9.0115 s 9.0574 s 9.1011 s] change: [-1.0658% -0.3407% +0.3803%] (p = 0.40 > 0.05) No change in performance detected. ```
sync up #799 with master
834e0d6 to
b38ac9d
Compare
### Change scope - [x] unify `Expression` with ceno - [x] unify sumcheck with ceno - [ ] WIP GKR witness generation, take bit benchmark as example --------- Co-authored-by: Zhang Zhuo <[email protected]>
``` RUST_LOG=info JEMALLOC_SYS_WITH_MALLOC_CONF=retain:true,metadata_thp:always,thp:always,dirty_decay_ms:-1,muzzy_decay_ms:-1,abort_conf:true cargo run --features jemalloc --package gkr_iop --bin lookup_keccak ``` > this only cover prover flow, and not verifier flow yet benchmark command ``` JEMALLOC_SYS_WITH_MALLOC_CONF=retain:true,metadata_thp:always,thp:always,dirty_decay_ms:-1,muzzy_decay_ms:-1,abort_conf:true cargo bench -p gkr_iop --features jemalloc --bench lookup_keccakf ``` Benchmark results on AMD EPYC 32 cores machine | Version | Throughput (keccak/s) | |------------------------|------------------------| | Ceno Keccak version | 4215 | | Plonky3 + Baby Bear | 1188.47 | | Plonky3 + Goldilocks | 683.05 | | Ceno (textbook gkr) | 128 | --------- Co-authored-by: Zhang Zhuo <[email protected]>
hero78119
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amazing work with many inspiring new designs 👍 !!
### Change This PR sync with ceno master, and rollback partial of change to assure not affect ceno mainflow benchmark ### benchmark against master | Benchmark | Median Time (s) | Median Change (%) | |----------------------------------|------------------|-------------------------------------| | fibonacci_max_steps_1048576 | 2.1283 | +2.0905% (Change within noise) | | fibonacci_max_steps_2097152 | 3.6231 | +0.9229% (No change in performance) | | fibonacci_max_steps_4194304 | 6.4747 | -0.1104% (No change in performance) | --------- Co-authored-by: Zhang Zhuo <[email protected]> Co-authored-by: xkx <[email protected]> Co-authored-by: Akase Haruka <[email protected]>
This PR build on top of #799 with one extra 48ded1a to introduce backend expression and cached in constrain system. This align the design with pre-compile so its easier for next step refactor to introduce precompile chip in main flow. Main sumcheck read/write lookup expression was simplified, as post `evaluate()` was also removed. ### Expression Expression will be simplified into 2 kind: frontend and backend expression - frontend expression: expression with Witin/StructuralWitin/Fixed, in recursive/nested style - backend expression: expression with Witin only, in monomial style. After circuit setup, both expression content are all known and freezed. During runtime, we can take backend expression and evaluate its scalar with "challenge/instance" then the final expression can be put into sumcheck. ### benchmark The nice thing is before/after change, there is no performance difference. | Benchmark | Median Time (s) | Median Change (%) | |----------------------------------|------------------|----------------------------------------| | fibonacci_max_steps_1048576 | 2.0641 | -0.9869% (No change in performance detected) | | fibonacci_max_steps_2097152 | 3.5514 | -1.0748% (Change within noise threshold) | | fibonacci_max_steps_1048576 | 2.0641 | -0.9869% (No change in performance detected) |
This is an implementation of the expression-based and plonkish-like GKR
IOP protocol. The circuit is denoted as `Chip`, holding all information
to process commit phases and GKR proving phase. In the current
implementation, we assume there are two commit phases. To process the
GKR phase, we extract a `GKRCircuit` from it and run the GKR protocol.
For the implementation status, the GKR phase is ready for review, while
the commit phases hasn't been finalized.
Define a GKR IOP protocol for a chip includes defining
`build_commit_phase`, `build_commit_phase2` and `build_gkr_phase`.
Specially, `build_gkr_phase` is mainly to build GKR layers in the
reverse order. In addition to specify the expressions, to simplify the
case of either transferring evaluations from an input of a succeeding
layer to an output of the current layer or even make some computations
before feeding to the current layer, we use an evaluation tape to place
the evaluations and `EvalExpression` to define the computation. Each
layer input will be assigned a position in the evaluation tape.
`EvalExpression` is defined as follows:
```rust
#[derive(Clone, Debug)]
pub enum EvalExpression {
Single(usize),
Linear(usize, Constant, Constant),
Partition(Vec<Box<EvalExpression>>, Vec<(usize, Constant)>),
}
```
of which the items denote how to compute the output evaluations. For
more details please refer to
[gkr_iop/src/evaluation.rs](https://github.com/scroll-tech/ceno/blob/tianyi/refactor-prover/gkr_iop/src/evaluation.rs).
Here are some subsequent tasks:
- [ ] Parallelize the vector evaluations under
`subprotocols/src/expression/`.
- [ ] Devirgo migration.
- [ ] Benchmarks.
- [ ] Keccak example and benchmarks.
Although the previous tasks should be done, I suggest to start the first
round of review first. Would like to see comments from @naure and
@hero78119 so that I can adjust the design before moving forward.
**Upd:** The design doc: https://hackmd.io/@sphere-liu/HyLR-h2L1g.
---------
Co-authored-by: Mihai <[email protected]>
Co-authored-by: mcalancea <[email protected]>
Co-authored-by: Sphere L <[email protected]>
Co-authored-by: Ming <[email protected]>
Co-authored-by: Zhang Zhuo <[email protected]>
Co-authored-by: xkx <[email protected]>
Co-authored-by: Akase Haruka <[email protected]>
This PR build on top of #799 with one extra 48ded1a to introduce backend expression and cached in constrain system. This align the design with pre-compile so its easier for next step refactor to introduce precompile chip in main flow. Main sumcheck read/write lookup expression was simplified, as post `evaluate()` was also removed. ### Expression Expression will be simplified into 2 kind: frontend and backend expression - frontend expression: expression with Witin/StructuralWitin/Fixed, in recursive/nested style - backend expression: expression with Witin only, in monomial style. After circuit setup, both expression content are all known and freezed. During runtime, we can take backend expression and evaluate its scalar with "challenge/instance" then the final expression can be put into sumcheck. ### benchmark The nice thing is before/after change, there is no performance difference. | Benchmark | Median Time (s) | Median Change (%) | |----------------------------------|------------------|----------------------------------------| | fibonacci_max_steps_1048576 | 2.0641 | -0.9869% (No change in performance detected) | | fibonacci_max_steps_2097152 | 3.5514 | -1.0748% (Change within noise threshold) | | fibonacci_max_steps_1048576 | 2.0641 | -0.9869% (No change in performance detected) |
This is an implementation of the expression-based and plonkish-like GKR IOP protocol. The circuit is denoted as
Chip, holding all information to process commit phases and GKR proving phase. In the current implementation, we assume there are two commit phases. To process the GKR phase, we extract aGKRCircuitfrom it and run the GKR protocol. For the implementation status, the GKR phase is ready for review, while the commit phases hasn't been finalized.Define a GKR IOP protocol for a chip includes defining
build_commit_phase,build_commit_phase2andbuild_gkr_phase. Specially,build_gkr_phaseis mainly to build GKR layers in the reverse order. In addition to specify the expressions, to simplify the case of either transferring evaluations from an input of a succeeding layer to an output of the current layer or even make some computations before feeding to the current layer, we use an evaluation tape to place the evaluations andEvalExpressionto define the computation. Each layer input will be assigned a position in the evaluation tape.EvalExpressionis defined as follows:of which the items denote how to compute the output evaluations. For more details please refer to gkr_iop/src/evaluation.rs.
Here are some subsequent tasks:
subprotocols/src/expression/.Although the previous tasks should be done, I suggest to start the first round of review first. Would like to see comments from @naure and @hero78119 so that I can adjust the design before moving forward.
Upd: The design doc: https://hackmd.io/@sphere-liu/HyLR-h2L1g.