-
Notifications
You must be signed in to change notification settings - Fork 94
Optimize forwarding in eq and ord plugins #252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for the very clear PR that looks high-quality (I haven't looked at the code yet). A quick comment:
In general, suppose we are transforming an expression (Other translations are possible, such as |
I have now read the code and I believe it is correct, thanks again! One minor remark: you left some comments |
Thanks for the feedback! I updated both the eta-expansion and the quoter comments to provide some explanation for the special optimized case.
Indeed, when trying to understand what this eta-expansion and quoting is about, I realized that eta-expansion on the outermost level was missing if a non- |
A couple of additional remarks:
|
Yes, but how would you robustly check for performance regression, in a way that does not depend on the system on which the tests are run? One way to do this would be to count bytecode instructions, but I don't know of an easy, practical way to track this. (You can run a bytecode binary with the instrumented runtime, |
I wondered the same and I agree. Your optimization makes sense because it corresponds to a well-understood use case (we are not optimized, rather de-pessimizing). I would wait until a clear use-case comes for constants (or functions, etc.) before going further. |
Automating it is nontrivial indeed. Anyway, this goes beyond what currently would be needed, just a thought about how it might be done. |
Yes, having to update the "baseline" on a semi-regular basis is okay, but we want to avoid flaky tests that bring the CI down irregularly. If you want to try to work something out and submit a PR, that would be warmly welcome, but I am not sure how to get a robust test -- besides the (Maybe a simpler approach would be to just store the |
From: ocaml-ppx/ppx_deriving#252. Signed-off-by: Simmo Saan <[email protected]>
From: ocaml-ppx/ppx_deriving#252. Signed-off-by: Simmo Saan <[email protected]>
From: ocaml-ppx/ppx_deriving#252. Signed-off-by: Simmo Saan <[email protected]>
From: ocaml-ppx/ppx_deriving#252. Signed-off-by: Simmo Saan <[email protected]>
From: ocaml-ppx/ppx_deriving#252. Signed-off-by: Simmo Saan <[email protected]>
CHANGES: * Port standard plugins to ppxlib registration and attributes ocaml-ppx/ppx_deriving#263 (Simmo Saan) * Optimize forwarding in eq and ord plugins ocaml-ppx/ppx_deriving#252 (Simmo Saan) * Delegate quoter to ppxlib ocaml-ppx/ppx_deriving#263 (Simmo Saan) * Introduce `Ppx_deriving_runtime.Stdlib` with OCaml >= 4.07. This module already exists in OCaml < 4.07 but was missing otherwise. ocaml-ppx/ppx_deriving#258 (Kate Deplaix)
CHANGES: * Fix a bug in `[@@deriving make]` that caused errors when it was used on a set of type declarations containing at least one non record type. ocaml-ppx/ppx_deriving#281 (@NathanReb) * Embed errors instead of raising exceptions when generating code with `ppx_deriving.make` ocaml-ppx/ppx_deriving#281 (@NathanReb) * Remove `[%derive.iter ...]`, `[%derive.map ...]` and `[%derive.fold ...]` extensions ocaml-ppx/ppx_deriving#278 (Simmo Saan) * Port standard plugins to ppxlib registration and attributes ocaml-ppx/ppx_deriving#263 (Simmo Saan) * Optimize forwarding in eq and ord plugins ocaml-ppx/ppx_deriving#252 (Simmo Saan) * Delegate quoter to ppxlib ocaml-ppx/ppx_deriving#263 (Simmo Saan) * Introduce `Ppx_deriving_runtime.Stdlib` with OCaml >= 4.07. This module already exists in OCaml < 4.07 but was missing otherwise. ocaml-ppx/ppx_deriving#258 (Kate Deplaix)
…rep-proprietary#2266) ## Goal The goal of this PR is to reduce our total allocated memory by not infering `compare : t -> t -> int` for our `Gensym` library. ## Why not ppx derive `compare` for `Gensym`? TL;DR: the derived ppx compare allocates a closure In one of my traces, I found the following memory behavior: <img width="482" alt="Screenshot 2024-09-18 at 11 17 01 AM" src="https://github.com/user-attachments/assets/1bc79b21-2925-4a2f-8d68-94204ad5bfbf"> [About 40 Gigabytes of 600 of this semgrep run is allocated in `IL.compare_name`; which is weird, as this function is just a string + Int comparison](https://github.com/semgrep/semgrep-proprietary/blob/80957a396f497e9b5906b6db018bdaef03a5f4d3/OSS/src/il/IL.ml#L111-L116C15), with no place to really allocate memory. The source of all of these allocations was in the ppx derived `G.SId.compare`, which compares two ints as so: ```ocaml let rec compare : t -> t -> Ppx_deriving_runtime.int = ((let __0 () (a : int) b = Ppx_deriving_runtime.compare a b in ((let open! ((Ppx_deriving_runtime)[@ocaml.warning "-A"]) in fun x -> (__0 ()) x) [@ocaml.warning "-A"])) [@ocaml.warning "-39"])[@@ocaml.warning "-39"] ``` Which unnecessarily allocates a closure. ## Post ppx-unrolling Manually implementing a compare however we get the following trace: <img width="443" alt="Screenshot 2024-09-18 at 12 02 01 PM" src="https://github.com/user-attachments/assets/60aeb2aa-c9a7-44f0-b10b-05ff85f991d1"> ## Notes - There was a bug fix on ppx trying to resolve this exact issue (ocaml-ppx/ppx_deriving#252) - All of these allocations end up in the minor heap, so it isn't much of a performance win, but we should get rid of it anyways ## Test plan ### Correctness - `make test` ### performance - run the command generated by `semgrep --pro --config tests/perf/rules/deepsemgrep-sqli-rules.yaml blaze-persistence -d --timeout=0` ([where blaze-persistance is this repo ](https://github.com/Blazebit/blaze-persistence)), before and after introducing the fix w/Memtrace. synced from Pro 69685213f1c376f8103846da4c36c9cc63e443ce
…rep-proprietary#2266) ## Goal The goal of this PR is to reduce our total allocated memory by not infering `compare : t -> t -> int` for our `Gensym` library. ## Why not ppx derive `compare` for `Gensym`? TL;DR: the derived ppx compare allocates a closure In one of my traces, I found the following memory behavior: <img width="482" alt="Screenshot 2024-09-18 at 11 17 01 AM" src="https://github.com/user-attachments/assets/1bc79b21-2925-4a2f-8d68-94204ad5bfbf"> [About 40 Gigabytes of 600 of this semgrep run is allocated in `IL.compare_name`; which is weird, as this function is just a string + Int comparison](https://github.com/semgrep/semgrep-proprietary/blob/80957a396f497e9b5906b6db018bdaef03a5f4d3/OSS/src/il/IL.ml#L111-L116C15), with no place to really allocate memory. The source of all of these allocations was in the ppx derived `G.SId.compare`, which compares two ints as so: ```ocaml let rec compare : t -> t -> Ppx_deriving_runtime.int = ((let __0 () (a : int) b = Ppx_deriving_runtime.compare a b in ((let open! ((Ppx_deriving_runtime)[@ocaml.warning "-A"]) in fun x -> (__0 ()) x) [@ocaml.warning "-A"])) [@ocaml.warning "-39"])[@@ocaml.warning "-39"] ``` Which unnecessarily allocates a closure. ## Post ppx-unrolling Manually implementing a compare however we get the following trace: <img width="443" alt="Screenshot 2024-09-18 at 12 02 01 PM" src="https://github.com/user-attachments/assets/60aeb2aa-c9a7-44f0-b10b-05ff85f991d1"> ## Notes - There was a bug fix on ppx trying to resolve this exact issue (ocaml-ppx/ppx_deriving#252) - All of these allocations end up in the minor heap, so it isn't much of a performance win, but we should get rid of it anyways ## Test plan ### Correctness - `make test` ### performance - run the command generated by `semgrep --pro --config tests/perf/rules/deepsemgrep-sqli-rules.yaml blaze-persistence -d --timeout=0` ([where blaze-persistance is this repo ](https://github.com/Blazebit/blaze-persistence)), before and after introducing the fix w/Memtrace. synced from Pro 69685213f1c376f8103846da4c36c9cc63e443ce
…rep-proprietary#2266) ## Goal The goal of this PR is to reduce our total allocated memory by not infering `compare : t -> t -> int` for our `Gensym` library. ## Why not ppx derive `compare` for `Gensym`? TL;DR: the derived ppx compare allocates a closure In one of my traces, I found the following memory behavior: <img width="482" alt="Screenshot 2024-09-18 at 11 17 01 AM" src="https://github.com/user-attachments/assets/1bc79b21-2925-4a2f-8d68-94204ad5bfbf"> [About 40 Gigabytes of 600 of this semgrep run is allocated in `IL.compare_name`; which is weird, as this function is just a string + Int comparison](https://github.com/semgrep/semgrep-proprietary/blob/80957a396f497e9b5906b6db018bdaef03a5f4d3/OSS/src/il/IL.ml#L111-L116C15), with no place to really allocate memory. The source of all of these allocations was in the ppx derived `G.SId.compare`, which compares two ints as so: ```ocaml let rec compare : t -> t -> Ppx_deriving_runtime.int = ((let __0 () (a : int) b = Ppx_deriving_runtime.compare a b in ((let open! ((Ppx_deriving_runtime)[@ocaml.warning "-A"]) in fun x -> (__0 ()) x) [@ocaml.warning "-A"])) [@ocaml.warning "-39"])[@@ocaml.warning "-39"] ``` Which unnecessarily allocates a closure. ## Post ppx-unrolling Manually implementing a compare however we get the following trace: <img width="443" alt="Screenshot 2024-09-18 at 12 02 01 PM" src="https://github.com/user-attachments/assets/60aeb2aa-c9a7-44f0-b10b-05ff85f991d1"> ## Notes - There was a bug fix on ppx trying to resolve this exact issue (ocaml-ppx/ppx_deriving#252) - All of these allocations end up in the minor heap, so it isn't much of a performance win, but we should get rid of it anyways ## Test plan ### Correctness - `make test` ### performance - run the command generated by `semgrep --pro --config tests/perf/rules/deepsemgrep-sqli-rules.yaml blaze-persistence -d --timeout=0` ([where blaze-persistance is this repo ](https://github.com/Blazebit/blaze-persistence)), before and after introducing the fix w/Memtrace. synced from Pro 69685213f1c376f8103846da4c36c9cc63e443ce
After switching a large project from manual
equal
andcompare
implementations to ones derived by ppx_deriving (goblint/analyzer#227), we noticed significant performance decreases in in those functions (goblint/analyzer#265).Benchmarking the generated
equal
functions directly (https://github.com/goblint/analyzer/blob/a21f33511183074c693c15269990210b283ae045/bench/deriving/benchEq.ml, executable independently from our project) showed up to 3 times slowdown. For example, when the old manual implementationwas replaced with one derived for
Int.t * String.t
.The modules
Int
andString
are just chosen as example, we often use functors where the modules come abstractly from arguments, so using primitive types isn't an option for deriving.Inefficient derived code
Currently, the following implementation is derived:
The benchmarks confirm that this literal code performs the same as the derived one.
Optimizations
This PR suggests two optimizations for forwarding the
equal
calls.Avoid unnecessary eta-expansion
PR #55 introduced eta-expansion to forwarded calls. This was necessary since derived implementations for mutual recursion in the generated
let rec
must be statically constructive (https://ocaml.org/manual/letrecvalues.html), so expanding a forwarding call into afun
works around the restriction.The problem in this case is that the eta-expansion is added to all forwarding calls, including those deeper in the resulting
equal
implementation, where the restriction doesn't apply any more because the outermost expression is afun
already. AFAIK, eta-expansion is only needed at the top level if it's not afun
already (most cases ofexpr_of_typ
are anyway). Therefore this PR implements exactly that. The linked benchmarks show a 50%…300% speedup from this change.Avoid unnecessary closures in quoting
Issue #57 and commit c3bee7b introduced quoting to forwarded calls. The quoting mechanism introduces functions with
()
argument. There is no explanation why, but I'm guessing it's to be lazy: in case the quoted expression involves some computation, then don't perform it immediately at the beginning (it might not be necessary due to short-circuiting, etc) but perform it each time it's actually used, just like if the quoted expression where independently at each use.The problem in this case is that forward calls being quoted are just identifier expressions, which AFAIK cannot be performing any computation on their own. Therefore this PR implements a special case of quoting just identifiers to be without the
()
argument. The linked benchmarks show a 200%…400% speedup from this change on the pair examples. Just forwarding to a single function doesn't seem to gain any speedup, nor slowdown.Efficient derived code
As a result of these changes, the new derived code for the example above is:
And overall the derived implementations seem to be at least 3 times faster, bringing them mostly on par with the manual implementations.