Skip to content

Conversation

@hargoniX
Copy link
Contributor

@hargoniX hargoniX commented Sep 30, 2025

This PR implements zero cost BaseIO by erasing the IO.RealWorld parameter from argument lists and structures. This is a major breaking change for FFI.

Concretely:

  • BaseIO is defined in terms of ST IO.RealWorld
  • EIO (and thus IO) is defined in terms of EST IO.RealWorld
  • The opaque Void type is introduced and the trivial structure optimization updated to account for it. Furthermore, arguments of type Void s are removed from the argument lists of the C functions.
  • ST is redefined as Void s -> ST.Out s a where ST.Out is a pair of Void s and a

This together has the following major effects on our generated code:

  • Functions that return BaseIO/ST/EIO/IO/EST now do not take the dummy world parameter anymore. To account for this FFI code needs to delete the dummy world parameter from the argument lists.
  • Functions that return BaseIO/ST now return their wrapped value directly. In particular BaseIO UInt32 now returns a uint32_t instead of a lean_object*. To account for this FFI code might have to change the return type and does not need to call lean_io_result_mk_ok anymore but can instead just return values right away (same with extracting values from BaseIO computations.
  • Functions that return EIO/IO/EST now only return the equivalent of an Except node which reduces the allocation size. The lean_io_result_mk_ok/lean_io_result_mk_error functions were updated to account for this already so no change is required.

Besides improving performance by dropping allocation (sizes) we can now also do fun new things such as:

@[extern "malloc"]
opaque malloc (size : USize) : BaseIO USize

@hargoniX
Copy link
Contributor Author

!bench

@hargoniX
Copy link
Contributor Author

(I expect a performance regression from this, just want to see by how much)

@leanprover-bot
Copy link
Collaborator

Here are the benchmark results for commit 9088219.
There were significant changes against commit 239c348:

  Benchmark                              Metric                    Change
  ==================================================================================
- Init.Data.List.Sublist async           branches                    5.7%   (40.2 σ)
- Init.Data.List.Sublist async           instructions                5.1%   (33.0 σ)
- Init.Prelude async                     branches                    5.0%   (27.1 σ)
- Init.Prelude async                     instructions                4.6%   (22.9 σ)
- Std.Data.DHashMap.Internal.RawLemmas   branch-misses               8.3%   (25.8 σ)
- Std.Data.DHashMap.Internal.RawLemmas   branches                    4.6%   (39.1 σ)
- Std.Data.DHashMap.Internal.RawLemmas   instructions                4.3%   (33.6 σ)
- Std.Data.Internal.List.Associative     wall-clock                  3.7%   (28.4 σ)
- big_beq                                branch-misses               6.8%   (31.7 σ)
- big_beq                                branches                    4.7%  (234.9 σ)
- big_beq                                instructions                4.3%  (233.1 σ)
- big_beq                                maxrss                      1.6%   (26.1 σ)
- big_beq_rec                            branch-misses               7.4%   (37.5 σ)
- big_beq_rec                            branches                    3.3%  (235.7 σ)
- big_beq_rec                            instructions                3.1%  (219.5 σ)
- big_deceq                              branch-misses               5.9%   (25.0 σ)
- big_deceq                              branches                    5.8%  (118.9 σ)
- big_deceq                              instructions                5.4%  (107.1 σ)
- big_deceq_rec                          branch-misses               6.8%  (129.7 σ)
- big_deceq_rec                          branches                    5.7%  (294.3 σ)
- big_deceq_rec                          instructions                5.2%  (297.1 σ)
- big_do                                 branches                    6.1%  (816.0 σ)
- big_do                                 instructions                5.7%  (721.7 σ)
- big_match                              branch-misses               6.8%   (25.3 σ)
- big_match                              branches                    5.8% (3470.6 σ)
- big_match                              instructions                5.3% (3023.6 σ)
- big_match                              maxrss                      1.5%  (144.4 σ)
- big_omega.lean                         branches                    5.5% (1069.0 σ)
- big_omega.lean                         instructions                4.6%  (255.3 σ)
- big_omega.lean MT                      branches                    5.4%  (479.8 σ)
- big_omega.lean MT                      instructions                4.6%  (444.5 σ)
- bv_decide_mul                          maxrss                      1.6%   (28.2 σ)
- bv_decide_rewriter.lean                branch-misses              21.0%   (44.5 σ)
- bv_decide_rewriter.lean                branches                    8.0% (1192.9 σ)
- bv_decide_rewriter.lean                instructions                7.5%  (968.8 σ)
- bv_decide_rewriter.lean                task-clock                 14.3%   (67.0 σ)
- bv_decide_rewriter.lean                wall-clock                 14.3%   (35.8 σ)
- grind_bitvec2.lean                     branch-misses              10.3%   (21.2 σ)
- grind_bitvec2.lean                     branches                    7.2%   (32.2 σ)
- grind_bitvec2.lean                     instructions                6.5%   (27.7 σ)
- grind_list2.lean                       branch-misses              11.2%   (32.6 σ)
- grind_list2.lean                       branches                    8.6%   (90.2 σ)
- grind_list2.lean                       instructions                7.9%   (74.6 σ)
- grind_ring_5.lean                      branch-misses               5.5%   (46.4 σ)
- grind_ring_5.lean                      branches                    1.4%  (215.2 σ)
- grind_ring_5.lean                      instructions                1.0%  (159.2 σ)
- hashmap.lean                           branches                    4.6% (2595.1 σ)
- hashmap.lean                           instructions                4.9% (2314.1 σ)
- identifier auto-completion             branches                    1.1%   (47.2 σ)
- identifier auto-completion             instructions                1.1%   (40.2 σ)
- import Lean                            branches                    1.6%   (29.8 σ)
- import Lean                            instructions                1.5%   (23.6 σ)
- iterators (elab)                       branch-misses              12.0%   (40.5 σ)
- iterators (elab)                       branches                   11.2%  (502.9 σ)
- iterators (elab)                       instructions               11.3%  (421.8 σ)
- iterators (interpreted)                branches                    1.6%  (355.7 σ)
- iterators (interpreted)                instructions                1.2%  (297.9 σ)
- lake build clean                       instructions                1.6%  (326.5 σ)
- lake config elab                       instructions               11.2%  (571.9 σ)
- lake config elab                       maxrss                      1.4%   (38.7 σ)
- lake config import                     instructions                1.6%   (38.9 σ)
- lake config tree                       instructions                1.7%  (190.4 σ)
- lake env                               instructions                1.7%   (32.3 σ)
- language server startup                branches                    5.2%  (107.8 σ)
- language server startup                instructions                5.0%   (83.5 σ)
- libleanshared.so                       binary size                13.1%
- mut_rec_wf                             branch-misses               9.5%   (48.6 σ)
- mut_rec_wf                             branches                    5.8%  (494.8 σ)
- mut_rec_wf                             instructions                5.3%  (405.0 σ)
- mut_rec_wf                             maxrss                      1.8%   (21.2 σ)
- omega_stress.lean async                branches                    4.8%  (135.6 σ)
- omega_stress.lean async                instructions                4.2%  (106.9 σ)
- omega_stress.lean async                maxrss                      2.6%   (36.8 σ)
- phashmap.lean                          branches                    2.5%  (398.5 σ)
- phashmap.lean                          instructions                2.5%  (318.1 σ)
- qsort                                  instructions                2.2%  (749.7 σ)
- reduceMatch                            instructions                3.4%   (87.1 σ)
- riscv-ast.lean                         branch-misses              13.6%   (68.4 σ)
- riscv-ast.lean                         branches                    7.3% (1228.2 σ)
- riscv-ast.lean                         instructions                6.8%  (948.7 σ)
- riscv-ast.lean                         task-clock                 10.1%   (46.8 σ)
- riscv-ast.lean                         wall-clock                 10.9%   (71.4 σ)
- simp_arith1                            branches                    4.6%   (58.0 σ)
- simp_arith1                            instructions                4.1%   (38.3 σ)
- simp_bubblesort_256                    branch-misses              47.1%   (75.3 σ)
- simp_bubblesort_256                    branches                   10.9%  (633.2 σ)
- simp_bubblesort_256                    instructions               10.2%  (611.5 σ)
- simp_congr                             branch-misses              14.7%   (40.0 σ)
- simp_congr                             branches                    3.4%   (92.3 σ)
- simp_congr                             instructions                3.0%   (78.6 σ)
- simp_local                             branch-misses              77.0%   (37.0 σ)
- simp_local                             branches                   14.1% (1085.4 σ)
- simp_local                             instructions               13.6%  (835.1 σ)
- simp_subexpr                           branch-misses              44.7%   (31.1 σ)
- simp_subexpr                           branches                   10.8%  (165.2 σ)
- simp_subexpr                           instructions               10.1%  (132.7 σ)
- stdlib                                 compilation (LCNF base)    30.2%   (48.7 σ)
- stdlib                                 compilation (LCNF mono)    24.0%   (20.8 σ)
- stdlib                                 congr simp thm              7.1%   (22.5 σ)
- stdlib                                 grind ac                   48.0%   (24.3 σ)
- stdlib                                 grind ring                 31.7%   (27.0 σ)
- stdlib                                 instructions                8.7%  (300.3 σ)
- stdlib                                 longest build path         11.8%   (36.8 σ)
- stdlib                                 longest rebuild path       10.8%   (39.4 σ)
- stdlib                                 process pre-definitions     3.5%   (23.6 σ)
- stdlib                                 task-clock                  6.8%   (35.2 σ)
- stdlib                                 wall-clock                 12.3%   (80.7 σ)
- stdlib size                            bytes .ir                  10.9%
- stdlib size                            bytes .olean.private        2.9%
- stdlib size                            lines C                    15.4%
- tests/bench/ interpreted               maxrss                      1.5%   (67.0 σ)
- tests/compiler                         sum binary sizes           11.6%
- treemap.lean                           branches                    1.1% (1530.3 σ)
- treemap.lean                           iterate                    53.1%   (78.3 σ)
- workspaceSymbols                       maxrss                      1.6%   (23.0 σ)
+ workspaceSymbols with new ranges       instructions               -1.2%  (-62.4 σ)

@Garmelon
Copy link
Contributor

@hargoniX hargoniX force-pushed the hbv/baseio_zero branch 4 times, most recently from 8dc7ac4 to 3a10225 Compare October 2, 2025 15:19
@hargoniX
Copy link
Contributor Author

hargoniX commented Oct 2, 2025

!bench

@leanprover-bot
Copy link
Collaborator

Here are the benchmark results for commit 3a10225.
There were significant changes against commit 663d4d2:

  Benchmark                              Metric                       Change
  ======================================================================================
- Init.Data.BitVec.Lemmas                maxrss                        71.5%    (29.8 σ)
- Init.Data.BitVec.Lemmas re-elab        maxrss                        20.5%    (74.2 σ)
- Init.Data.List.Sublist re-elab -j4     maxrss                       175.3%   (291.8 σ)
- Std.Data.DHashMap.Internal.RawLemmas   maxrss                       141.1%    (21.8 σ)
- Std.Data.Internal.List.Associative     maxrss                        98.0%    (98.1 σ)
+ big_beq                                branches                      -1.1%   (-69.6 σ)
- big_beq                                maxrss                        17.4%   (349.9 σ)
- big_beq_rec                            maxrss                        22.2%   (240.0 σ)
- big_deceq                              maxrss                         7.3%    (78.7 σ)
- big_deceq_rec                          maxrss                        10.3%   (263.5 σ)
+ big_do                                 branches                      -2.0%  (-105.6 σ)
+ big_do                                 instructions                  -2.0%   (-84.3 σ)
- big_do                                 maxrss                        30.9%   (119.7 σ)
- big_match                              maxrss                        13.8%   (199.9 σ)
- big_omega.lean                         maxrss                         5.0%    (24.6 σ)
- big_omega.lean MT                      maxrss                         4.7%    (87.4 σ)
- binarytrees                            maxrss                        13.2%   (285.9 σ)
+ bv_decide_mod                          maxrss                        -1.1%   (-86.7 σ)
- bv_decide_rewriter.lean                maxrss                         8.1%    (24.6 σ)
- grind_bitvec2.lean                     maxrss                        97.9%   (104.5 σ)
- grind_list2.lean                       maxrss                        39.7%    (32.5 σ)
+ iterators (elab)                       branches                      -7.8%  (-547.2 σ)
+ iterators (elab)                       instructions                  -5.0%  (-297.4 σ)
- iterators (elab)                       maxrss                       103.6%   (309.8 σ)
+ iterators (interpreted)                branches                      -1.1%  (-527.8 σ)
- iterators (interpreted)                maxrss                       102.7%  (1360.2 σ)
- lake config elab                       instructions                   1.9%    (21.2 σ)
- lake config elab                       maxrss                        15.5%   (157.7 σ)
+ libleanshared.so                       binary size                   -5.2%
- mut_rec_wf                             maxrss                        17.6%   (144.0 σ)
+ nat_repr                               wall-clock                    -5.3%   (-23.8 σ)
+ omega_stress.lean async                branch-misses                 -2.5%   (-36.1 σ)
- phashmap.lean                          branches                       1.9%   (191.1 σ)
- phashmap.lean                          instructions                   1.9%   (155.0 σ)
- phashmap.lean                          iterate                       22.6%    (31.5 σ)
- qsort                                  instructions                   2.1%   (825.2 σ)
+ rbmap_library                          instructions                  -1.2%  (-814.4 σ)
+ riscv-ast.lean                         branches                      -1.1%   (-80.0 σ)
- riscv-ast.lean                         maxrss                       114.5%   (171.4 σ)
+ simp_arith1                            branches                      -1.8%  (-299.1 σ)
- simp_arith1                            maxrss                        13.8%    (38.1 σ)
+ simp_bubblesort_256                    branches                      -1.2%   (-27.1 σ)
+ simp_bubblesort_256                    instructions                  -1.6%   (-28.7 σ)
+ simp_local                             branches                      -3.6%  (-202.7 σ)
+ simp_local                             instructions                  -4.4%  (-184.4 σ)
- simp_local                             maxrss                        21.9%   (124.0 σ)
+ simp_subexpr                           branch-misses                 -9.7%   (-21.1 σ)
+ simp_subexpr                           branches                      -1.2%   (-57.7 σ)
+ simp_subexpr                           instructions                  -1.7%   (-69.5 σ)
- stdlib                                 compilation (IR)               6.1%    (76.2 σ)
- stdlib                                 compilation (LCNF base)       26.5%    (37.7 σ)
- stdlib                                 compilation (LCNF mono)       13.7%    (49.2 σ)
+ stdlib                                 grind ematch                  -3.0%   (-98.5 σ)
- stdlib                                 instantiate metavars          14.7%    (42.6 σ)
- stdlib                                 let-to-have transformation     6.8%    (23.9 σ)
- stdlib                                 maxrss                       723.6% (43831.4 σ)
- stdlib                                 wall-clock                     8.3%    (39.0 σ)
+ stdlib size                            bytes .ir                     -7.3%
- stdlib size                            bytes .olean.private           2.9%
+ stdlib size                            lines C                       -7.2%
+ tests/compiler                         sum binary sizes              -4.6%
- workspaceSymbols                       maxrss                         1.8%    (22.2 σ)

@Garmelon
Copy link
Contributor

Garmelon commented Oct 2, 2025

!radar

@leanprover-radar
Copy link

leanprover-radar commented Oct 2, 2025

Benchmarking finished, results.

@hargoniX
Copy link
Contributor Author

hargoniX commented Oct 2, 2025

!bench

@leanprover-radar
Copy link

leanprover-radar commented Oct 2, 2025

Benchmark results for 0e5100c against 663d4d2 are in! @hargoniX

@leanprover-bot
Copy link
Collaborator

Here are the benchmark results for commit 0e5100c.
There were significant changes against commit 663d4d2:

  Benchmark                            Metric                    Change
  ================================================================================
- Init.Data.BitVec.Lemmas              maxrss                     72.9%   (65.3 σ)
- Init.Data.List.Sublist re-elab -j4   maxrss                    172.3%  (430.4 σ)
- Init.Prelude async                   maxrss                     67.9%   (31.0 σ)
- Std.Data.Internal.List.Associative   maxrss                     99.2%   (41.1 σ)
+ big_beq                              branches                   -1.6%  (-72.6 σ)
- big_beq                              maxrss                     16.1%   (89.9 σ)
- big_beq_rec                          maxrss                     20.7%  (160.5 σ)
+ big_deceq                            branches                   -1.6% (-113.9 σ)
+ big_deceq                            instructions               -1.1%  (-82.3 σ)
- big_deceq                            maxrss                      6.4%   (36.9 σ)
+ big_deceq_rec                        branches                   -1.6% (-218.6 σ)
+ big_deceq_rec                        instructions               -1.1% (-144.4 σ)
- big_deceq_rec                        maxrss                      9.0%   (44.1 σ)
- big_do                               maxrss                     30.2%  (101.3 σ)
- big_do                               task-clock                  4.2%   (23.2 σ)
- big_do                               wall-clock                  4.3%   (21.1 σ)
- big_match                            maxrss                     10.5%  (128.4 σ)
- big_omega.lean                       maxrss                      5.3%   (35.9 σ)
- binarytrees                          maxrss                     13.2%  (148.2 σ)
- bv_decide_rewriter.lean              maxrss                      8.4%  (577.7 σ)
- grind_bitvec2.lean                   maxrss                     97.1%   (44.3 σ)
- grind_list2.lean                     maxrss                     38.7%   (39.6 σ)
- identifier auto-completion           maxrss                      5.6%  (173.0 σ)
+ iterators (elab)                     branches                   -9.9% (-490.2 σ)
+ iterators (elab)                     instructions               -7.2% (-274.6 σ)
- iterators (elab)                     maxrss                    102.1%  (504.7 σ)
+ iterators (interpreted)              branches                   -1.5% (-548.6 σ)
- iterators (interpreted)              maxrss                    100.9%  (213.4 σ)
+ lake config elab                     instructions               -3.0% (-257.2 σ)
- lake config elab                     maxrss                     14.6%  (459.4 σ)
+ libleanshared.so                     binary size                -2.2%
- mut_rec_wf                           maxrss                     15.0%   (55.6 σ)
- riscv-ast.lean                       maxrss                    114.2%  (321.8 σ)
+ simp_arith1                          branches                   -2.8%  (-96.3 σ)
+ simp_arith1                          instructions               -1.8%  (-36.5 σ)
- simp_arith1                          maxrss                     13.6%   (40.6 σ)
+ simp_bubblesort_256                  branches                   -1.0%  (-41.8 σ)
+ simp_bubblesort_256                  instructions               -1.4%  (-36.7 σ)
+ simp_local                           branches                   -2.6% (-587.1 σ)
+ simp_local                           instructions               -3.2% (-964.5 σ)
- simp_local                           maxrss                     23.3%  (221.4 σ)
+ simp_subexpr                         branches                   -1.1%  (-35.1 σ)
+ simp_subexpr                         instructions               -1.5%  (-40.5 σ)
- stdlib                               compilation (LCNF base)     8.3%   (86.8 σ)
- stdlib                               compilation (LCNF mono)     7.9%   (22.8 σ)
+ stdlib                               instructions               -2.0%  (-28.6 σ)
- stdlib                               maxrss                    682.1% (2310.9 σ)
- stdlib                               process pre-definitions     5.7%   (32.9 σ)
- stdlib                               task-clock                  2.3%   (24.6 σ)
+ stdlib size                          bytes .ir                  -2.1%
+ stdlib size                          lines C                    -2.2%
+ tests/compiler                       sum binary sizes           -2.0%
+ treemap.lean                         iterate                   -11.6%  (-23.8 σ)
- workspaceSymbols                     maxrss                      1.2%   (52.5 σ)

@hargoniX
Copy link
Contributor Author

hargoniX commented Oct 2, 2025

!radar

@leanprover-radar
Copy link

leanprover-radar commented Oct 2, 2025

Benchmark results for 4afc85f against 663d4d2 are in! @hargoniX

@hargoniX hargoniX force-pushed the hbv/baseio_zero branch 3 times, most recently from b45e7fb to d4830d3 Compare October 6, 2025 11:47
@hargoniX
Copy link
Contributor Author

hargoniX commented Oct 6, 2025

!radar

@leanprover-radar
Copy link

leanprover-radar commented Oct 6, 2025

Benchmark results for d4830d3 against 663d4d2 are in! @hargoniX

@hargoniX
Copy link
Contributor Author

hargoniX commented Oct 6, 2025

!radar

@leanprover-radar
Copy link

leanprover-radar commented Oct 6, 2025

Benchmark results for 4b8007e against 663d4d2 are in! @hargoniX

@hargoniX hargoniX added the release-ci Enable all CI checks for a PR, like is done for releases label Oct 9, 2025
@hargoniX hargoniX force-pushed the hbv/baseio_zero branch 2 times, most recently from cdf0b85 to c446880 Compare October 10, 2025 21:46
@github-actions github-actions bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label Oct 10, 2025
@leanprover-community-bot
Copy link
Collaborator

leanprover-community-bot commented Oct 10, 2025

Mathlib CI status (docs):

  • ❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase 71ddf227d2effc7725bf20e655a806a461e26cfe --onto 705dac4f77aa60d749ee2b37d102518d772d6e53. You can force Mathlib CI using the force-mathlib-ci label. (2025-10-10 23:45:23)
  • ❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase bd0b91de07cef14411cdff1e84dd42d5b0776f15 --onto efbbb0b230ce95653d25b59c83fd24a51a8bf363. You can force Mathlib CI using the force-mathlib-ci label. (2025-10-21 13:30:59)

@leanprover-bot
Copy link
Collaborator

leanprover-bot commented Oct 10, 2025

Reference manual CI status:

  • ❗ Reference manual CI will not be attempted unless your PR branches off the nightly-with-manual branch. Try git rebase 71ddf227d2effc7725bf20e655a806a461e26cfe --onto d3dda9f6d4428a906c096067ecb75e432afc4615. You can force reference manual CI using the force-manual-ci label. (2025-10-10 23:45:25)
  • ❗ Reference manual CI will not be attempted unless your PR branches off the nightly-with-manual branch. Try git rebase bd0b91de07cef14411cdff1e84dd42d5b0776f15 --onto d3dda9f6d4428a906c096067ecb75e432afc4615. You can force reference manual CI using the force-manual-ci label. (2025-10-21 13:31:01)

@hargoniX hargoniX force-pushed the hbv/baseio_zero branch 4 times, most recently from 943cd75 to fd243fc Compare October 21, 2025 12:07
@hargoniX
Copy link
Contributor Author

!radar

@leanprover-radar
Copy link

leanprover-radar commented Oct 21, 2025

Benchmark results for fd243fc against bd0b91d are in! @hargoniX

@hargoniX hargoniX marked this pull request as ready for review October 21, 2025 12:42
@Kha Kha merged commit 52b1b34 into master Oct 22, 2025
16 checks passed
@hargoniX hargoniX deleted the hbv/baseio_zero branch October 22, 2025 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog-compiler Compiler, runtime, and FFI changes-stage0 Contains stage0 changes, merge manually using rebase release-ci Enable all CI checks for a PR, like is done for releases toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants