-
Notifications
You must be signed in to change notification settings - Fork 687
feat: zero cost BaseIO #10625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: zero cost BaseIO #10625
Conversation
|
!bench |
|
(I expect a performance regression from this, just want to see by how much) |
|
Here are the benchmark results for commit 9088219. Benchmark Metric Change
==================================================================================
- Init.Data.List.Sublist async branches 5.7% (40.2 σ)
- Init.Data.List.Sublist async instructions 5.1% (33.0 σ)
- Init.Prelude async branches 5.0% (27.1 σ)
- Init.Prelude async instructions 4.6% (22.9 σ)
- Std.Data.DHashMap.Internal.RawLemmas branch-misses 8.3% (25.8 σ)
- Std.Data.DHashMap.Internal.RawLemmas branches 4.6% (39.1 σ)
- Std.Data.DHashMap.Internal.RawLemmas instructions 4.3% (33.6 σ)
- Std.Data.Internal.List.Associative wall-clock 3.7% (28.4 σ)
- big_beq branch-misses 6.8% (31.7 σ)
- big_beq branches 4.7% (234.9 σ)
- big_beq instructions 4.3% (233.1 σ)
- big_beq maxrss 1.6% (26.1 σ)
- big_beq_rec branch-misses 7.4% (37.5 σ)
- big_beq_rec branches 3.3% (235.7 σ)
- big_beq_rec instructions 3.1% (219.5 σ)
- big_deceq branch-misses 5.9% (25.0 σ)
- big_deceq branches 5.8% (118.9 σ)
- big_deceq instructions 5.4% (107.1 σ)
- big_deceq_rec branch-misses 6.8% (129.7 σ)
- big_deceq_rec branches 5.7% (294.3 σ)
- big_deceq_rec instructions 5.2% (297.1 σ)
- big_do branches 6.1% (816.0 σ)
- big_do instructions 5.7% (721.7 σ)
- big_match branch-misses 6.8% (25.3 σ)
- big_match branches 5.8% (3470.6 σ)
- big_match instructions 5.3% (3023.6 σ)
- big_match maxrss 1.5% (144.4 σ)
- big_omega.lean branches 5.5% (1069.0 σ)
- big_omega.lean instructions 4.6% (255.3 σ)
- big_omega.lean MT branches 5.4% (479.8 σ)
- big_omega.lean MT instructions 4.6% (444.5 σ)
- bv_decide_mul maxrss 1.6% (28.2 σ)
- bv_decide_rewriter.lean branch-misses 21.0% (44.5 σ)
- bv_decide_rewriter.lean branches 8.0% (1192.9 σ)
- bv_decide_rewriter.lean instructions 7.5% (968.8 σ)
- bv_decide_rewriter.lean task-clock 14.3% (67.0 σ)
- bv_decide_rewriter.lean wall-clock 14.3% (35.8 σ)
- grind_bitvec2.lean branch-misses 10.3% (21.2 σ)
- grind_bitvec2.lean branches 7.2% (32.2 σ)
- grind_bitvec2.lean instructions 6.5% (27.7 σ)
- grind_list2.lean branch-misses 11.2% (32.6 σ)
- grind_list2.lean branches 8.6% (90.2 σ)
- grind_list2.lean instructions 7.9% (74.6 σ)
- grind_ring_5.lean branch-misses 5.5% (46.4 σ)
- grind_ring_5.lean branches 1.4% (215.2 σ)
- grind_ring_5.lean instructions 1.0% (159.2 σ)
- hashmap.lean branches 4.6% (2595.1 σ)
- hashmap.lean instructions 4.9% (2314.1 σ)
- identifier auto-completion branches 1.1% (47.2 σ)
- identifier auto-completion instructions 1.1% (40.2 σ)
- import Lean branches 1.6% (29.8 σ)
- import Lean instructions 1.5% (23.6 σ)
- iterators (elab) branch-misses 12.0% (40.5 σ)
- iterators (elab) branches 11.2% (502.9 σ)
- iterators (elab) instructions 11.3% (421.8 σ)
- iterators (interpreted) branches 1.6% (355.7 σ)
- iterators (interpreted) instructions 1.2% (297.9 σ)
- lake build clean instructions 1.6% (326.5 σ)
- lake config elab instructions 11.2% (571.9 σ)
- lake config elab maxrss 1.4% (38.7 σ)
- lake config import instructions 1.6% (38.9 σ)
- lake config tree instructions 1.7% (190.4 σ)
- lake env instructions 1.7% (32.3 σ)
- language server startup branches 5.2% (107.8 σ)
- language server startup instructions 5.0% (83.5 σ)
- libleanshared.so binary size 13.1%
- mut_rec_wf branch-misses 9.5% (48.6 σ)
- mut_rec_wf branches 5.8% (494.8 σ)
- mut_rec_wf instructions 5.3% (405.0 σ)
- mut_rec_wf maxrss 1.8% (21.2 σ)
- omega_stress.lean async branches 4.8% (135.6 σ)
- omega_stress.lean async instructions 4.2% (106.9 σ)
- omega_stress.lean async maxrss 2.6% (36.8 σ)
- phashmap.lean branches 2.5% (398.5 σ)
- phashmap.lean instructions 2.5% (318.1 σ)
- qsort instructions 2.2% (749.7 σ)
- reduceMatch instructions 3.4% (87.1 σ)
- riscv-ast.lean branch-misses 13.6% (68.4 σ)
- riscv-ast.lean branches 7.3% (1228.2 σ)
- riscv-ast.lean instructions 6.8% (948.7 σ)
- riscv-ast.lean task-clock 10.1% (46.8 σ)
- riscv-ast.lean wall-clock 10.9% (71.4 σ)
- simp_arith1 branches 4.6% (58.0 σ)
- simp_arith1 instructions 4.1% (38.3 σ)
- simp_bubblesort_256 branch-misses 47.1% (75.3 σ)
- simp_bubblesort_256 branches 10.9% (633.2 σ)
- simp_bubblesort_256 instructions 10.2% (611.5 σ)
- simp_congr branch-misses 14.7% (40.0 σ)
- simp_congr branches 3.4% (92.3 σ)
- simp_congr instructions 3.0% (78.6 σ)
- simp_local branch-misses 77.0% (37.0 σ)
- simp_local branches 14.1% (1085.4 σ)
- simp_local instructions 13.6% (835.1 σ)
- simp_subexpr branch-misses 44.7% (31.1 σ)
- simp_subexpr branches 10.8% (165.2 σ)
- simp_subexpr instructions 10.1% (132.7 σ)
- stdlib compilation (LCNF base) 30.2% (48.7 σ)
- stdlib compilation (LCNF mono) 24.0% (20.8 σ)
- stdlib congr simp thm 7.1% (22.5 σ)
- stdlib grind ac 48.0% (24.3 σ)
- stdlib grind ring 31.7% (27.0 σ)
- stdlib instructions 8.7% (300.3 σ)
- stdlib longest build path 11.8% (36.8 σ)
- stdlib longest rebuild path 10.8% (39.4 σ)
- stdlib process pre-definitions 3.5% (23.6 σ)
- stdlib task-clock 6.8% (35.2 σ)
- stdlib wall-clock 12.3% (80.7 σ)
- stdlib size bytes .ir 10.9%
- stdlib size bytes .olean.private 2.9%
- stdlib size lines C 15.4%
- tests/bench/ interpreted maxrss 1.5% (67.0 σ)
- tests/compiler sum binary sizes 11.6%
- treemap.lean branches 1.1% (1530.3 σ)
- treemap.lean iterate 53.1% (78.3 σ)
- workspaceSymbols maxrss 1.6% (23.0 σ)
+ workspaceSymbols with new ranges instructions -1.2% (-62.4 σ) |
8dc7ac4 to
3a10225
Compare
|
!bench |
|
Here are the benchmark results for commit 3a10225. Benchmark Metric Change
======================================================================================
- Init.Data.BitVec.Lemmas maxrss 71.5% (29.8 σ)
- Init.Data.BitVec.Lemmas re-elab maxrss 20.5% (74.2 σ)
- Init.Data.List.Sublist re-elab -j4 maxrss 175.3% (291.8 σ)
- Std.Data.DHashMap.Internal.RawLemmas maxrss 141.1% (21.8 σ)
- Std.Data.Internal.List.Associative maxrss 98.0% (98.1 σ)
+ big_beq branches -1.1% (-69.6 σ)
- big_beq maxrss 17.4% (349.9 σ)
- big_beq_rec maxrss 22.2% (240.0 σ)
- big_deceq maxrss 7.3% (78.7 σ)
- big_deceq_rec maxrss 10.3% (263.5 σ)
+ big_do branches -2.0% (-105.6 σ)
+ big_do instructions -2.0% (-84.3 σ)
- big_do maxrss 30.9% (119.7 σ)
- big_match maxrss 13.8% (199.9 σ)
- big_omega.lean maxrss 5.0% (24.6 σ)
- big_omega.lean MT maxrss 4.7% (87.4 σ)
- binarytrees maxrss 13.2% (285.9 σ)
+ bv_decide_mod maxrss -1.1% (-86.7 σ)
- bv_decide_rewriter.lean maxrss 8.1% (24.6 σ)
- grind_bitvec2.lean maxrss 97.9% (104.5 σ)
- grind_list2.lean maxrss 39.7% (32.5 σ)
+ iterators (elab) branches -7.8% (-547.2 σ)
+ iterators (elab) instructions -5.0% (-297.4 σ)
- iterators (elab) maxrss 103.6% (309.8 σ)
+ iterators (interpreted) branches -1.1% (-527.8 σ)
- iterators (interpreted) maxrss 102.7% (1360.2 σ)
- lake config elab instructions 1.9% (21.2 σ)
- lake config elab maxrss 15.5% (157.7 σ)
+ libleanshared.so binary size -5.2%
- mut_rec_wf maxrss 17.6% (144.0 σ)
+ nat_repr wall-clock -5.3% (-23.8 σ)
+ omega_stress.lean async branch-misses -2.5% (-36.1 σ)
- phashmap.lean branches 1.9% (191.1 σ)
- phashmap.lean instructions 1.9% (155.0 σ)
- phashmap.lean iterate 22.6% (31.5 σ)
- qsort instructions 2.1% (825.2 σ)
+ rbmap_library instructions -1.2% (-814.4 σ)
+ riscv-ast.lean branches -1.1% (-80.0 σ)
- riscv-ast.lean maxrss 114.5% (171.4 σ)
+ simp_arith1 branches -1.8% (-299.1 σ)
- simp_arith1 maxrss 13.8% (38.1 σ)
+ simp_bubblesort_256 branches -1.2% (-27.1 σ)
+ simp_bubblesort_256 instructions -1.6% (-28.7 σ)
+ simp_local branches -3.6% (-202.7 σ)
+ simp_local instructions -4.4% (-184.4 σ)
- simp_local maxrss 21.9% (124.0 σ)
+ simp_subexpr branch-misses -9.7% (-21.1 σ)
+ simp_subexpr branches -1.2% (-57.7 σ)
+ simp_subexpr instructions -1.7% (-69.5 σ)
- stdlib compilation (IR) 6.1% (76.2 σ)
- stdlib compilation (LCNF base) 26.5% (37.7 σ)
- stdlib compilation (LCNF mono) 13.7% (49.2 σ)
+ stdlib grind ematch -3.0% (-98.5 σ)
- stdlib instantiate metavars 14.7% (42.6 σ)
- stdlib let-to-have transformation 6.8% (23.9 σ)
- stdlib maxrss 723.6% (43831.4 σ)
- stdlib wall-clock 8.3% (39.0 σ)
+ stdlib size bytes .ir -7.3%
- stdlib size bytes .olean.private 2.9%
+ stdlib size lines C -7.2%
+ tests/compiler sum binary sizes -4.6%
- workspaceSymbols maxrss 1.8% (22.2 σ) |
|
!radar |
|
Benchmarking finished, results. |
3a10225 to
0e5100c
Compare
|
!bench |
|
Benchmark results for 0e5100c against 663d4d2 are in! @hargoniX |
|
Here are the benchmark results for commit 0e5100c. Benchmark Metric Change
================================================================================
- Init.Data.BitVec.Lemmas maxrss 72.9% (65.3 σ)
- Init.Data.List.Sublist re-elab -j4 maxrss 172.3% (430.4 σ)
- Init.Prelude async maxrss 67.9% (31.0 σ)
- Std.Data.Internal.List.Associative maxrss 99.2% (41.1 σ)
+ big_beq branches -1.6% (-72.6 σ)
- big_beq maxrss 16.1% (89.9 σ)
- big_beq_rec maxrss 20.7% (160.5 σ)
+ big_deceq branches -1.6% (-113.9 σ)
+ big_deceq instructions -1.1% (-82.3 σ)
- big_deceq maxrss 6.4% (36.9 σ)
+ big_deceq_rec branches -1.6% (-218.6 σ)
+ big_deceq_rec instructions -1.1% (-144.4 σ)
- big_deceq_rec maxrss 9.0% (44.1 σ)
- big_do maxrss 30.2% (101.3 σ)
- big_do task-clock 4.2% (23.2 σ)
- big_do wall-clock 4.3% (21.1 σ)
- big_match maxrss 10.5% (128.4 σ)
- big_omega.lean maxrss 5.3% (35.9 σ)
- binarytrees maxrss 13.2% (148.2 σ)
- bv_decide_rewriter.lean maxrss 8.4% (577.7 σ)
- grind_bitvec2.lean maxrss 97.1% (44.3 σ)
- grind_list2.lean maxrss 38.7% (39.6 σ)
- identifier auto-completion maxrss 5.6% (173.0 σ)
+ iterators (elab) branches -9.9% (-490.2 σ)
+ iterators (elab) instructions -7.2% (-274.6 σ)
- iterators (elab) maxrss 102.1% (504.7 σ)
+ iterators (interpreted) branches -1.5% (-548.6 σ)
- iterators (interpreted) maxrss 100.9% (213.4 σ)
+ lake config elab instructions -3.0% (-257.2 σ)
- lake config elab maxrss 14.6% (459.4 σ)
+ libleanshared.so binary size -2.2%
- mut_rec_wf maxrss 15.0% (55.6 σ)
- riscv-ast.lean maxrss 114.2% (321.8 σ)
+ simp_arith1 branches -2.8% (-96.3 σ)
+ simp_arith1 instructions -1.8% (-36.5 σ)
- simp_arith1 maxrss 13.6% (40.6 σ)
+ simp_bubblesort_256 branches -1.0% (-41.8 σ)
+ simp_bubblesort_256 instructions -1.4% (-36.7 σ)
+ simp_local branches -2.6% (-587.1 σ)
+ simp_local instructions -3.2% (-964.5 σ)
- simp_local maxrss 23.3% (221.4 σ)
+ simp_subexpr branches -1.1% (-35.1 σ)
+ simp_subexpr instructions -1.5% (-40.5 σ)
- stdlib compilation (LCNF base) 8.3% (86.8 σ)
- stdlib compilation (LCNF mono) 7.9% (22.8 σ)
+ stdlib instructions -2.0% (-28.6 σ)
- stdlib maxrss 682.1% (2310.9 σ)
- stdlib process pre-definitions 5.7% (32.9 σ)
- stdlib task-clock 2.3% (24.6 σ)
+ stdlib size bytes .ir -2.1%
+ stdlib size lines C -2.2%
+ tests/compiler sum binary sizes -2.0%
+ treemap.lean iterate -11.6% (-23.8 σ)
- workspaceSymbols maxrss 1.2% (52.5 σ) |
0e5100c to
4afc85f
Compare
|
!radar |
|
Benchmark results for 4afc85f against 663d4d2 are in! @hargoniX |
b45e7fb to
d4830d3
Compare
|
!radar |
|
Benchmark results for d4830d3 against 663d4d2 are in! @hargoniX |
d4830d3 to
4b8007e
Compare
|
!radar |
|
Benchmark results for 4b8007e against 663d4d2 are in! @hargoniX |
2bb894e to
930c82e
Compare
cdf0b85 to
c446880
Compare
|
Mathlib CI status (docs):
|
|
Reference manual CI status:
|
943cd75 to
fd243fc
Compare
|
!radar |
|
Benchmark results for fd243fc against bd0b91d are in! @hargoniX |
fd243fc to
3450d30
Compare
This PR implements zero cost
BaseIOby erasing theIO.RealWorldparameter from argument lists and structures. This is a major breaking change for FFI.Concretely:
BaseIOis defined in terms ofST IO.RealWorldEIO(and thusIO) is defined in terms ofEST IO.RealWorldVoidtype is introduced and the trivial structure optimization updated to account for it. Furthermore, arguments of typeVoid sare removed from the argument lists of the C functions.STis redefined asVoid s -> ST.Out s awhereST.Outis a pair ofVoid sandaThis together has the following major effects on our generated code:
BaseIO/ST/EIO/IO/ESTnow do not take the dummy world parameter anymore. To account for this FFI code needs to delete the dummy world parameter from the argument lists.BaseIO/STnow return their wrapped value directly. In particularBaseIO UInt32now returns auint32_tinstead of alean_object*. To account for this FFI code might have to change the return type and does not need to calllean_io_result_mk_okanymore but can instead justreturnvalues right away (same with extracting values fromBaseIOcomputations.EIO/IO/ESTnow only return the equivalent of anExceptnode which reduces the allocation size. Thelean_io_result_mk_ok/lean_io_result_mk_errorfunctions were updated to account for this already so no change is required.Besides improving performance by dropping allocation (sizes) we can now also do fun new things such as:
@[extern "malloc"] opaque malloc (size : USize) : BaseIO USize