Optimize forwarding in eq and ord plugins #252

sim642 · 2021-08-13T09:21:19Z

After switching a large project from manual equal and compare implementations to ones derived by ppx_deriving (goblint/analyzer#227), we noticed significant performance decreases in in those functions (goblint/analyzer#265).

Benchmarking the generated equal functions directly (https://github.com/goblint/analyzer/blob/a21f33511183074c693c15269990210b283ae045/bench/deriving/benchEq.ml, executable independently from our project) showed up to 3 times slowdown. For example, when the old manual implementation

let equal (x1, y1) (x2, y2) = Int.equal x1 x2 && String.equal y1 y2

was replaced with one derived for Int.t * String.t.
The modules Int and String are just chosen as example, we often use functors where the modules come abstractly from arguments, so using primitive types isn't an option for deriving.

Inefficient derived code

Currently, the following implementation is derived:

let equal =
  let __1 () = String.equal
  and __0 () = Int.equal in
  ((let open! ((Ppx_deriving_runtime)[@ocaml.warning "-A"]) in
    fun (lhs0, lhs1) ->
      fun (rhs0, rhs1) ->
        ((fun x -> (__0 ()) x) lhs0 rhs0) &&
          ((fun x -> (__1 ()) x) lhs1 rhs1))
  [@ocaml.warning "-A"])

The benchmarks confirm that this literal code performs the same as the derived one.

Optimizations

This PR suggests two optimizations for forwarding the equal calls.

Avoid unnecessary eta-expansion

PR #55 introduced eta-expansion to forwarded calls. This was necessary since derived implementations for mutual recursion in the generated let rec must be statically constructive (https://ocaml.org/manual/letrecvalues.html), so expanding a forwarding call into a fun works around the restriction.

The problem in this case is that the eta-expansion is added to all forwarding calls, including those deeper in the resulting equal implementation, where the restriction doesn't apply any more because the outermost expression is a fun already. AFAIK, eta-expansion is only needed at the top level if it's not a fun already (most cases of expr_of_typ are anyway). Therefore this PR implements exactly that. The linked benchmarks show a 50%…300% speedup from this change.

Avoid unnecessary closures in quoting

Issue #57 and commit c3bee7b introduced quoting to forwarded calls. The quoting mechanism introduces functions with () argument. There is no explanation why, but I'm guessing it's to be lazy: in case the quoted expression involves some computation, then don't perform it immediately at the beginning (it might not be necessary due to short-circuiting, etc) but perform it each time it's actually used, just like if the quoted expression where independently at each use.

The problem in this case is that forward calls being quoted are just identifier expressions, which AFAIK cannot be performing any computation on their own. Therefore this PR implements a special case of quoting just identifiers to be without the () argument. The linked benchmarks show a 200%…400% speedup from this change on the pair examples. Just forwarding to a single function doesn't seem to gain any speedup, nor slowdown.

Efficient derived code

As a result of these changes, the new derived code for the example above is:

let equal =
  let __1 = String.equal
  and __0 = Int.equal in
  ((let open! ((Ppx_deriving_runtime)[@ocaml.warning "-A"]) in
      fun (lhs0, lhs1) ->
        fun (rhs0, rhs1) -> (__0 lhs0 rhs0) && (__1 lhs1 rhs1))
    [@ocaml.warning "-A"])

And overall the derived implementations seem to be at least 3 times faster, bringing them mostly on par with the manual implementations.

gasche · 2021-08-13T13:30:08Z

Thanks for the very clear PR that looks high-quality (I haven't looked at the code yet).

A quick comment:

The quoting mechanism introduces functions with () argument. There is no explanation why, but I'm guessing it's to be lazy: in case the quoted expression involves some computation, then don't perform it immediately at the beginning (it might not be necessary due to short-circuiting, etc) but perform it each time it's actually used, just like if the quoted expression where independently at each use.

In general, suppose we are transforming an expression C[e] (a subexpression e inside a wider context C[...]), where e is some source expression that we want to transform into a desugared/transformed piece of code e´, where e' should be evaluated in the current scope (for example, it refers to some identifier x that we know scopes over C[e]. Transforming this into C[e'] would be incorrect (un-hygienic), because the context C[...] written by the user could contain binders that shadow this x with a different definition; for example C[...] could be fun x -> ... or let open Foo in ..., where Foo defines an x. So instead one may want to translate this term into let __private_identifier = e' in C[__private_identifier], but this is incorrect because it changes the order of evaluation (or the number of types e' is evaluated, etc.). On the other hand, the translation let __private_identifier () = e' in C[__private_identifier ()] is always correct, scope-wise and evaluation-wise. Of course, as you point out it introduces constant-factor inefficiencies in general, and is not necessary when e´ is known to be a constant or an identifier (its evaluation order does not matter, and evaluating it when it is unnecessary does not degrade performances).

(Other translations are possible, such as let c x = C[x] in c e', or let x = e' and c x = C[x] in c x, which do not require generating "private" identifiers; but we digress, and those introduce a function call that is harder to simplify away with local reasoning.)

gasche · 2021-08-13T13:35:42Z

I have now read the code and I believe it is correct, thanks again!

One minor remark: you left some comments (* eta-expanded if outermost *) in two places in eq and ord each. I'm not sure those leftover comments (in one case a trace of a previous workaround, in the other a bug opportunity that you noticed) really help code understanding. Putting myself in the shoes of a newcomer, it's not clear what those comments mean, and they seem redundant with the comment that exists at the place where eta-expansion does actually take place. I would remove those extraneous comments, and enrich the comment on the actual eta-expansion site with a reference to the present PR that contains a detailed description for more context.

sim642 · 2021-08-13T15:37:43Z

Thanks for the feedback! I updated both the eta-expansion and the quoter comments to provide some explanation for the special optimized case.

in the other a bug opportunity that you noticed

Indeed, when trying to understand what this eta-expansion and quoting is about, I realized that eta-expansion on the outermost level was missing if a non-fun expression was explicitly provided via [@equal …] or [@compare …]. It took me a while to engineer the added tests because it only matters in the extremely obscure case when such expression refers to another derived one in the same mutually recursive definition (otherwise it's statically constructive by the simplest case) and that requires explicitly referencing the other to-be derived function.

sim642 · 2021-08-13T15:48:09Z

A couple of additional remarks:

Pexp_ident probably isn't the only case where the quote optimization can be done. For example, I think Pexp_constant and maybe some other combinations could also allow it. But I didn't want to attempt to implement a function to determine the purity of an expression by syntax and Pexp_ident covers the most likely use case of quoting.
This repository doesn't contain any benchmark code but if there's interest, I could also include that to continue living in this repository, since as it turns out small optimizations to derived code can make a big impact.

gasche · 2021-08-13T16:59:54Z

This repository doesn't contain any benchmark code but if there's interest, I could also include that to continue living in this repository, since as it turns out small optimizations to derived code can make a big impact.

Yes, but how would you robustly check for performance regression, in a way that does not depend on the system on which the tests are run? One way to do this would be to count bytecode instructions, but I don't know of an easy, practical way to track this. (You can run a bytecode binary with the instrumented runtime, ocamlrund -t foo.bc | wc -l, but it's a bit cumbersome to setup.)

gasche · 2021-08-13T17:01:22Z

Pexp_ident probably isn't the only case where the quote optimization can be done. For example, I think Pexp_constant and maybe some other combinations could also allow it. But I didn't want to attempt to implement a function to determine the purity of an expression by syntax and Pexp_ident covers the most likely use case of quoting.

I wondered the same and I agree. Your optimization makes sense because it corresponds to a well-understood use case (we are not optimized, rather de-pessimizing). I would wait until a clear use-case comes for constants (or functions, etc.) before going further.

sim642 · 2021-08-13T17:24:03Z

Yes, but how would you robustly check for performance regression, in a way that does not depend on the system on which the tests are run? One way to do this would be to count bytecode instructions, but I don't know of an easy, practical way to track this. (You can run a bytecode binary with the instrumented runtime, ocamlrund -t foo.bc | wc -l, but it's a bit cumbersome to setup.)

Automating it is nontrivial indeed.
I suppose one way would be to have an explicit copy of the last known good (i.e. with expected performance) derived code in the benchmark and have another one be derived via the current deriver implementation. Then check if the results significantly differ (beyond some margin of error) to automatically catch pessimization or optimization. And when intentional optimizations are made, just update the explicit copy from there on.

Anyway, this goes beyond what currently would be needed, just a thought about how it might be done.

gasche · 2021-08-13T19:50:09Z

Yes, having to update the "baseline" on a semi-regular basis is okay, but we want to avoid flaky tests that bring the CI down irregularly. If you want to try to work something out and submit a PR, that would be warmly welcome, but I am not sure how to get a robust test -- besides the ocamlrund trick I mentioned.

(Maybe a simpler approach would be to just store the -dparsetree output of the transformation, so that the test fails and has to be promoted whenever the generated code changes. When people change this test output we can think about the performance difference, possibly by running some benchmarks.)

From: ocaml-ppx/ppx_deriving#252.

From: ocaml-ppx/ppx_deriving#252. Signed-off-by: Simmo Saan <[email protected]>

CHANGES: * Port standard plugins to ppxlib registration and attributes ocaml-ppx/ppx_deriving#263 (Simmo Saan) * Optimize forwarding in eq and ord plugins ocaml-ppx/ppx_deriving#252 (Simmo Saan) * Delegate quoter to ppxlib ocaml-ppx/ppx_deriving#263 (Simmo Saan) * Introduce `Ppx_deriving_runtime.Stdlib` with OCaml >= 4.07. This module already exists in OCaml < 4.07 but was missing otherwise. ocaml-ppx/ppx_deriving#258 (Kate Deplaix)

@NathanReb

CHANGES: * Fix a bug in `[@@deriving make]` that caused errors when it was used on a set of type declarations containing at least one non record type. ocaml-ppx/ppx_deriving#281 (@NathanReb) * Embed errors instead of raising exceptions when generating code with `ppx_deriving.make` ocaml-ppx/ppx_deriving#281 (@NathanReb) * Remove `[%derive.iter ...]`, `[%derive.map ...]` and `[%derive.fold ...]` extensions ocaml-ppx/ppx_deriving#278 (Simmo Saan) * Port standard plugins to ppxlib registration and attributes ocaml-ppx/ppx_deriving#263 (Simmo Saan) * Optimize forwarding in eq and ord plugins ocaml-ppx/ppx_deriving#252 (Simmo Saan) * Delegate quoter to ppxlib ocaml-ppx/ppx_deriving#263 (Simmo Saan) * Introduce `Ppx_deriving_runtime.Stdlib` with OCaml >= 4.07. This module already exists in OCaml < 4.07 but was missing otherwise. ocaml-ppx/ppx_deriving#258 (Kate Deplaix)

…rep-proprietary#2266) ## Goal The goal of this PR is to reduce our total allocated memory by not infering `compare : t -> t -> int` for our `Gensym` library. ## Why not ppx derive `compare` for `Gensym`? TL;DR: the derived ppx compare allocates a closure In one of my traces, I found the following memory behavior: <img width="482" alt="Screenshot 2024-09-18 at 11 17 01 AM" src="https://github.com/user-attachments/assets/1bc79b21-2925-4a2f-8d68-94204ad5bfbf"> [About 40 Gigabytes of 600 of this semgrep run is allocated in `IL.compare_name`; which is weird, as this function is just a string + Int comparison](https://github.com/semgrep/semgrep-proprietary/blob/80957a396f497e9b5906b6db018bdaef03a5f4d3/OSS/src/il/IL.ml#L111-L116C15), with no place to really allocate memory. The source of all of these allocations was in the ppx derived `G.SId.compare`, which compares two ints as so: ```ocaml let rec compare : t -> t -> Ppx_deriving_runtime.int = ((let __0 () (a : int) b = Ppx_deriving_runtime.compare a b in ((let open! ((Ppx_deriving_runtime)[@ocaml.warning "-A"]) in fun x -> (__0 ()) x) [@ocaml.warning "-A"])) [@ocaml.warning "-39"])[@@ocaml.warning "-39"] ``` Which unnecessarily allocates a closure. ## Post ppx-unrolling Manually implementing a compare however we get the following trace: <img width="443" alt="Screenshot 2024-09-18 at 12 02 01 PM" src="https://github.com/user-attachments/assets/60aeb2aa-c9a7-44f0-b10b-05ff85f991d1"> ## Notes - There was a bug fix on ppx trying to resolve this exact issue (ocaml-ppx/ppx_deriving#252) - All of these allocations end up in the minor heap, so it isn't much of a performance win, but we should get rid of it anyways ## Test plan ### Correctness - `make test` ### performance - run the command generated by `semgrep --pro --config tests/perf/rules/deepsemgrep-sqli-rules.yaml blaze-persistence -d --timeout=0` ([where blaze-persistance is this repo ](https://github.com/Blazebit/blaze-persistence)), before and after introducing the fix w/Memtrace. synced from Pro 69685213f1c376f8103846da4c36c9cc63e443ce

sim642 added 5 commits August 12, 2021 15:51

Add eq test which requires eta-expansion for custom equal

e5dae86

Fix eq eta-expansion for custom equal

3b6f097

Optimize eq eta-expansion to apply only to outermost expression

3ebb67d

Optimize quoting of ident expressions

ee3f44c

Port eta-expansion optimization to ord

50bb709

sim642 mentioned this pull request Aug 13, 2021

Pin ppx_deriving fork with forwarding optimizations goblint/analyzer#326

Merged

gasche approved these changes Aug 13, 2021

View reviewed changes

sim642 added 2 commits August 13, 2021 18:27

Update eq and ord eta-expansion comments

61bde2f

Comment quote optimization

9f41915

gasche merged commit 0a89b61 into ocaml-ppx:master Aug 13, 2021

sim642 mentioned this pull request Nov 25, 2021

Release v1.1.x on opam repository goblint/analyzer#282

Closed

sim642 mentioned this pull request Jan 18, 2022

Ppx_deriving utility functions ocaml-ppx/ppxlib#317

Open

3 tasks

sim642 added a commit to sim642/ppxlib that referenced this pull request Mar 20, 2022

Port quoter ident optimization from ppx_deriving

470308e

From: ocaml-ppx/ppx_deriving#252.

sim642 mentioned this pull request Mar 20, 2022

Fix and optimize quoter ocaml-ppx/ppxlib#327

Merged

sim642 added a commit to sim642/ppxlib that referenced this pull request Mar 20, 2022

Port quoter ident optimization from ppx_deriving

1bbd890

From: ocaml-ppx/ppx_deriving#252. Signed-off-by: Simmo Saan <[email protected]>

sim642 added a commit to sim642/ppxlib that referenced this pull request Mar 20, 2022

Port quoter ident optimization from ppx_deriving

5f55b77

From: ocaml-ppx/ppx_deriving#252. Signed-off-by: Simmo Saan <[email protected]>

sim642 added a commit to sim642/ppxlib that referenced this pull request Mar 26, 2022

Port quoter ident optimization from ppx_deriving

52b566e

From: ocaml-ppx/ppx_deriving#252. Signed-off-by: Simmo Saan <[email protected]>

sim642 added a commit to sim642/ppxlib that referenced this pull request Mar 26, 2022

Port quoter ident optimization from ppx_deriving

8d9259a

From: ocaml-ppx/ppx_deriving#252. Signed-off-by: Simmo Saan <[email protected]>

sim642 added a commit to sim642/ppxlib that referenced this pull request Apr 8, 2022

Port quoter ident optimization from ppx_deriving

03234f1

From: ocaml-ppx/ppx_deriving#252. Signed-off-by: Simmo Saan <[email protected]>

sim642 mentioned this pull request Nov 5, 2023

Port standard plugins to ppxlib registration and attributes #263

Merged

2 tasks

sim642 mentioned this pull request Mar 8, 2024

Add missing unreleased CHANGELOG entries #275

Merged

NathanReb mentioned this pull request Mar 12, 2024

[new release] ppx_deriving (5.3.0) ocaml/opam-repository#25458

Closed

NathanReb mentioned this pull request Apr 15, 2024

[new release] ppx_deriving (6.0.0) ocaml/opam-repository#25675

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize forwarding in eq and ord plugins #252

Optimize forwarding in eq and ord plugins #252

Uh oh!

sim642 commented Aug 13, 2021 •

edited

Loading

Uh oh!

gasche commented Aug 13, 2021 •

edited

Loading

Uh oh!

gasche commented Aug 13, 2021

Uh oh!

sim642 commented Aug 13, 2021

Uh oh!

sim642 commented Aug 13, 2021

Uh oh!

gasche commented Aug 13, 2021

Uh oh!

gasche commented Aug 13, 2021 •

edited

Loading

Uh oh!

sim642 commented Aug 13, 2021

Uh oh!

gasche commented Aug 13, 2021

Uh oh!

Uh oh!

Optimize forwarding in eq and ord plugins #252

Optimize forwarding in eq and ord plugins #252

Uh oh!

Conversation

sim642 commented Aug 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Inefficient derived code

Optimizations

Avoid unnecessary eta-expansion

Avoid unnecessary closures in quoting

Efficient derived code

Uh oh!

gasche commented Aug 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gasche commented Aug 13, 2021

Uh oh!

sim642 commented Aug 13, 2021

Uh oh!

sim642 commented Aug 13, 2021

Uh oh!

gasche commented Aug 13, 2021

Uh oh!

gasche commented Aug 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sim642 commented Aug 13, 2021

Uh oh!

gasche commented Aug 13, 2021

Uh oh!

Uh oh!

sim642 commented Aug 13, 2021 •

edited

Loading

gasche commented Aug 13, 2021 •

edited

Loading

gasche commented Aug 13, 2021 •

edited

Loading