Proposed structure for integration of benchmarks with pytest-benchmark #128

johannahaffner · 2025-04-16T09:20:24Z

This PR is a proposal for what integration of benchmarks could look like.

I suggest:

keeping a separate benchmark tests
running these with the usual familiar testing machinery
keeping them out of the normal CI, at least once the benchmark collection is large enough to make inclusion impractical
including extra information, such as the number of steps, the result of a nonlinear solve, and so on, which is useful for comparison to other implementations
benchmarking results can be saved to a JSON file, which enables comparison across versions (check if we still get the same accuracy, if we now need more steps, and so on)

I have not included any compile time benchmarking yet, but I think this thing here can happily live next to differently flavoured pytest-codspeed benchmarks, such as discussed in patrick-kidger/equinox#1001

I'm parsing a collection of CUTEST problems for now, and once you get rid of all the FORTRAN stuff and weird formatting these don't take up that much space and can either live here, or perhaps elsewhere if that is more practical.

patrick-kidger

I think the basic structure of this looks good to me! I like being able to integrate this into pytest.

tests/conftest.py

benchmarks/test_benchmarks.py

benchmarks/requirements.txt

johannahaffner · 2025-04-29T08:46:34Z

skipif works, but then the benchmark folder has to be in tests, since otherwise we cannot run pytest benchmarks separately. In that case it won't look in the tests folder, where the extra option is configured. I think it's fine to have the benchmarks folder be a sub-directory of tests, so I moved it. (Running just the benchmarks is now possible with pytest tests/benchmarks.)

I also added an abstract base class for benchmark problems, and implemented this for our very favourite Rosenbrock function.

patrick-kidger

Maybe this would work if it was defined in a benchmarks/conftest.py (outside of tests)?

patrick-kidger · 2025-04-30T16:33:32Z

CONTRIBUTING.md

+benchmark tests with 
+
+```bash
+pytest --extensive


Maybe --benchmark?

I think there is to much confusion with the built-in flags then? We already have --benchmark-save, which is built-in and specifies where to save the results in benchmark format, and we also use the built-in benchmarking fixture with @pytest.mark.benchmark.

I think that means that our own decorator and flags should be different! Maybe optx_benchmarks?

Or @cutest and --cutest? This will work fine until we integrate other benchmarking suites.

You're anticipating a need to separate out solver benchmarks (cutest) vs compile-time benchmarks?

Anyway given that this is now in a separate directory, I'm realising that I think these will be ignored by pytest by default, and only ran if you're in the directory or if you run pytest benchmarks from the top-level, so perhaps we don't need a flag here at all?

No, they're not ignored by default, even if they are in a separate directory.

They would be ignored if the names did not start with test_, but then we'd have the reverse problem.

I think runtime and compile time benchmarks can be run with the same problem / solver setup, but the benchmark fixture stores all the results and I think things might get muddled there? But happy to try.

I took a quick look at the structure of the .json output, and it seems wiser to have two different test functions for compile and runtime. Keeps things neater.

Okay! So in that case maybe --compile-benchmarks and --runtime-benchmarks?

Or compose with existing pytest functionality, pytest --benchmark -k compile vs pytest --benchmark -k runtime?

Mostly I'd just like to have the flag make it clear that this isn't about running slow tests or something.

We now use these built in flags and skip by default. Regular -k flags also work.
I moved the benchmarking script into the regular tests folder again, since it is now quite small.

This makes the setup less complex, in particular we are no longer dealing with two different conftest.py files, I had noticed that running with pytest benchmarks/ was actually needed to pull in the custom flags (e.g. --max-dimension). Now there is a single flag to run the benchmarks, and custom ones for subsets.

tests/benchmarks/README.md

johannahaffner · 2025-04-30T20:07:25Z

I'm now going with @cutest and benchmarks now have their own configuration. Both might be useful, since we might want to run different benchmarking suites (once we get there!) independently of one another, if they are geared toward different problems and require different solvers.

And we might want to tweak the configuration - to check how well we do in 32-bit, for instance - without affecting the main test suite.

patrick-kidger · 2025-05-03T07:24:07Z

CONTRIBUTING.md

+benchmark tests with 
+
+```bash
+pytest --extensive


You're anticipating a need to separate out solver benchmarks (cutest) vs compile-time benchmarks?

Anyway given that this is now in a separate directory, I'm realising that I think these will be ignored by pytest by default, and only ran if you're in the directory or if you run pytest benchmarks from the top-level, so perhaps we don't need a flag here at all?

benchmarks/cutest/problem.py

patrick-kidger · 2025-05-03T07:26:30Z

benchmarks/test_benchmarks.py

+def block_tree_until_ready(x):
+    dynamic, static = eqx.partition(x, eqx.is_inexact_array)
+    dynamic = jtu.tree_map(lambda x: x.block_until_ready(), dynamic)
+    return eqx.combine(dynamic, static)


FWIW the partition/combine operations will add a small but measurable amount of overhead, as this happens outside JIT.

I'd be tempted to suggest instead that outputs should be required to always only be JAX arrays, and to then call jax.block_until_ready.

We could return the value of the objective function at the solution and use that to determine if the problem has been solved to some degree of accuracy.

This is all that is available for 90+ % of CUTEST problems anyway.

More detailed metrics such as the number of iterations could be analysed separately, but we'd lose access to them here.

Hmm I think these are separate concerns? I'm not considering convergence here, just a wallclock overhead.

Ah you mean only returning the dynamic part of the solution? I understood this as doing something like return solution.state.f_info.f or something!

So we need at least solution.RESULTS which does not support block_until_ready. To accommodate this, I wrote a minimal wrapper that calls the compiled function, calls block_until_ready on the objective value and returns some extra stuff as well.

Since the function does not actually do anything else, the overhead should be minimal. I don't know how it compares to a tree-mapped block_until_ready, though. If we really want to trim the microseconds then we could use success = solution.result == optx.RESULTS.successful, which is an array.

I suggest we do that when benchmarking for publication, but otherwise it seems overkill - the usage of the benchmarking set is to catch regressions and test performance of new features, and in this case I think it is nice to have a .json file from which we can read informative results, not just minimal timing results.

benchmarks/test_benchmarks.py

pfackeldey · 2025-05-04T16:10:32Z

Hi @johannahaffner,
not sure if these thoughts are useful for you here, but:

I think Github actions workers are not necessarily very homogeneous afaik. So it might be that you get different runtime values if you happen to land on a different worker. Of course, it's always better to have a server that you can use with a reliable small uncertainty on the runtime.
you might want to restrict JAX to use a single thread only. I ran some benchmarks in the past and found it hard to restrict this (maybe it's better by now?). At least on linux you can use taskset -c 0 pytest --cutest as a "solution".

johannahaffner · 2025-05-04T19:04:26Z

They are! Thank you :)

Github actions workers are not necessarily very homogeneous

I was completely unaware of this! But that makes sense. I think depending on the size this benchmark set grows to (fingers crossed), we might not want to integrate it into the CI, but actually run it locally and add some scripts to help analyse the results. So this is another point in favour of maybe not doing this as part of our CI.

you might want to restrict JAX to use a single thread only.
And good point! We should take this into account when comparing results to other implementations.

johannahaffner · 2025-06-10T19:34:55Z

I'm introducing a benchmark-time dependency on sif2jax here, which is a current WIP of mine aiming to (have LLMs) port the complete CUTEst collection to clean, human-readable and functionally pure JAX implementations.

I'll mark this as ready for review again when we have done further verification of the benchmark problems; currently about 247 benchmark problems pass tests against the Fortran implementation.

johannahaffner · 2025-06-16T00:17:36Z

you might want to restrict JAX to use a single thread only. I ran some benchmarks in the past and found it hard to restrict this (maybe it's better by now?). At least on linux you can use taskset -c 0 pytest --cutest as a "solution".

This does seem to have improved? I've left this in as a commented-out suggestion, but I'm not quite sure what specific behaviour it is meant to prevent if enabled, @pfackeldey.

johannahaffner · 2025-06-16T00:32:43Z

So I tested this a bit more rigorously and it seems like most things are working on our end!

What we have right now:

the ability to run benchmarks on CUTEst problems provided by sif2jax, this introduces a benchmark-time dependency on a project of mine
selectively run on problems up to a certain size
compare against scipy for specific, defined solvers (other implementations, e.g. in JAXopt, optax, tensorflow can be added following the same pattern)
analyse benchmark results - all the pytest-benchmark features, e.g. for comparison across runs, saved results with branch, commit info, time stamp, worktree status (dirty / clean - only clean guarantees reproducibility)
custom analysis: performance profiles of select solvers

I updated the README to reflect this.

Things that require a bit of thought still:

how to specify the solvers we're comparing against - solver tolerances do not necessarily have the same meaning in different libraries
what additional metadata to store and to assess to check if solver comparisons are truly fair
things like maybe allowing solver-specific maximum numbers of steps
how/if to put benchmark results under version control so contributors can compare against each other's runs (pytest-benchmark saves these in folders that are specific to the platform (Darwin, Linux, ...), the Python implementation (CPython, ...), the Python version and the precision used, so no risk of mixing these up). Currently the benchmarking results are git ignored.
I think it would be nice to ask Contributors that add a new solver if they could specify alternative implementations to benchmark against, since they are probably best positioned to judge the finer details of this -> this can go into CONTRIBUTING.md. Or does that make things sound too scary?
if we're happy with the structure, then compile-time benchmarks can be added too.

I'll mark this as ready to review for now :)

patrick-kidger · 2025-06-19T19:09:50Z

how to specify the solvers we're comparing against - solver tolerances do not necessarily have the same meaning in different libraries

Typically this is handled with a work-precision diagram.

how/if to put benchmark results under version control so contributors can compare against each other's runs (pytest-benchmark saves these in folders that are specific to the platform (Darwin, Linux, ...), the Python implementation (CPython, ...), the Python version and the precision used, so no risk of mixing these up). Currently the benchmarking results are git ignored.

Interesting idea! This strikes me as the kind of thing we should log somewhere, just not in version control.

I think it would be nice to ask Contributors that add a new solver if they could specify alternative implementations to benchmark against, since they are probably best positioned to judge the finer details of this -> this can go into CONTRIBUTING.md. Or does that make things sound too scary?

I think that sounds like a reasonable ask! Realistically if someone is going to the effort of contributing a solver then they're already in pretty deep, and probably have opinions about the existing options available :D

if we're happy with the structure, then compile-time benchmarks can be added too.

sg!

patrick-kidger

I really like the look of this.

patrick-kidger · 2025-06-19T19:12:34Z

CONTRIBUTING.md

+benchmark tests with 
+
+```bash
+pytest --extensive


Okay! So in that case maybe --compile-benchmarks and --runtime-benchmarks?

Or compose with existing pytest functionality, pytest --benchmark -k compile vs pytest --benchmark -k runtime?

Mostly I'd just like to have the flag make it clear that this isn't about running slow tests or something.

CONTRIBUTING.md

benchmarks/README.md

patrick-kidger · 2025-06-19T19:17:22Z

benchmarks/README.md

+For reproducibility, make sure that all your changes have been committed and your working tree is clean before running the benchmarks. `pytest-benchmark` will otherwise mark your benchmarking results as `dirty`. 
+Saved results include the commit, branch, version, and an exact timestamp by default.
+
+Note that benchmarks are run with `throw=False` enabled, since otherwise no result is written in the json file, but we do want to know if we failed to solve a problem.


If you wanted we could/should set EQX_ON_ERROR=nan here too. This is essentially equivalent to the above and is also a performance improvement.

(This has to happen before Equinox is imported.)

Note that this is essential for vjp as throw=False does nothing there I believe

Argh I see that I might have boxed myself a little into a corner here - I could simplify the workflow and put the benchmarks back in the tests folder. This way we only get one conftest and one setup to deal with, and there are no stupid foot guns such as custom flags only being available if we're also specifying the folder name: e.g.

pytest benchmarks/ --benchmark-only--max-dimension=10

would work, but

pytest --benchmark-only --max-dimension=10

would not, which I had not previously realised.

But this also means that the environment variable applies to the whole test session, right? And while we want it for benchmarking, we don't want it for testing.

Can I force a fresh environment for a specific testing module? Alternatively it might make sense to modify --benchmark-only so that if present, we do set the environment variable and not otherwise.

Yes I would use a new environment for benchmarks (I find this hepful in my workflow as tests run faster than benchmarks and I can fail fast if something is wrong). Then no need to fiddle around with environment variables in tests/ or benchmarks/ and they are just set in the workflow actions (or by user). This is also helpful as it allows the user to run with different setting to compare impact.

In Github workflows? That makes sense! Locally we default to no benchmark runs, and I just discovered that I can monkeypatch the environment variables for specific tests only - would that work, @patrick-kidger?

benchmarks/profile.py

pyproject.toml

benchmarks/test_benchmarks.py

patrick-kidger · 2025-06-19T19:24:57Z

benchmarks/test_benchmarks.py

+
+    def wrapped(y0):
+        solution = solve(y0)
+        objective_value = solution.state.f_info.f.block_until_ready()


Maybe just jax.block_until_ready(solution)?

This would add the overhead of a tree_map and for that reason we had decided not to do that. Background here :)

patrick-kidger · 2025-06-19T19:27:22Z

benchmarks/test_benchmarks.py

+        return objective_value, solution.result, num_steps
+
+    # Benchmark the runtime of the compiled function
+    values = benchmark.pedantic(wrapped, args=(problem.y0(),), rounds=5, iterations=1)


How does the benchmarking work here btw? As compared to e.g. min(timeit.repeat(..., number=..., repeat=...)) or the ipython %%timeit? (E.g. the former uses minimisation to handle one-sided noise, whilst the latter automatically figures out how many runs to perform.)

So rounds and iterations correspond to repeats and number. I switched to @jpbrodrick89's suggestion of not using the pedantic option.

When saving the result, more than just the minimum gets saved - a bunch of statistics get written to an output .json, and these can be selected among during analysis. There is also an option to save full results if desired.

Typically 5 rounds is very small compared to defaults and will be very susceptible to noise but depends how expensive the calculation is. I typically trust the standard non pedantic option.

jpbrodrick89 · 2025-06-20T10:57:41Z

Hi guys, I will chime in a tiny bit more detail this afternoon as I've been trying a lot of this on one of my repos. Unfortunately CodSpeeds usual instrumentation provides no value for Jax functions as everything counts as a system calls which is not trackable by valgrind. Therefore to use their tooling you have to use their wall runner where costs can mount up pretty quickly. I had 360 micro benchmarks and used up the free 2 hours in just four commits. Also, even on their bare metal wall runner there is still some noise probably due the time of actually retrieving the function from the jax cache rather than running the actual floap. Trying to work out if I can make this more useable next week. At least they've updated their docs in the last couple weeks quite substantially.
@johannahaffner happy to have a call if helpful.

johannahaffner · 2025-06-20T11:03:44Z

@johannahaffner happy to have a call if helpful.

A call would be great! I was just getting started experimenting with it (apologies for spamming your inbox with workflow failures, @patrick-kidger). In theory it seems like a nice setup!

johannahaffner

Addressed most comments, rest tonight :) Thank you both for your input!

johannahaffner · 2025-06-19T23:09:17Z

CONTRIBUTING.md

+benchmark tests with 
+
+```bash
+pytest --extensive


We now use these built in flags and skip by default. Regular -k flags also work.
I moved the benchmarking script into the regular tests folder again, since it is now quite small.

This makes the setup less complex, in particular we are no longer dealing with two different conftest.py files, I had noticed that running with pytest benchmarks/ was actually needed to pull in the custom flags (e.g. --max-dimension). Now there is a single flag to run the benchmarks, and custom ones for subsets.

benchmarks/README.md

johannahaffner · 2025-06-19T23:15:23Z

benchmarks/conftest.py

+jax.config.update("jax_enable_x64", True)
+jax.config.update("jax_numpy_rank_promotion", "raise")
+jax.config.update("jax_numpy_dtype_promotion", "standard")
+# Remark Peter: limit JAX to single-thread with taskset -c 0 pytest --cutest


I'm not sure if it what this is meant to prevent/accomplish, if we added it.

benchmarks/profile.py

johannahaffner · 2025-06-19T23:24:46Z

benchmarks/test_benchmarks.py

+
+    def wrapped(y0):
+        solution = solve(y0)
+        objective_value = solution.state.f_info.f.block_until_ready()


This would add the overhead of a tree_map and for that reason we had decided not to do that. Background here :)

johannahaffner · 2025-06-19T23:29:15Z

benchmarks/test_benchmarks.py

+        return objective_value, solution.result, num_steps
+
+    # Benchmark the runtime of the compiled function
+    values = benchmark.pedantic(wrapped, args=(problem.y0(),), rounds=5, iterations=1)


So rounds and iterations correspond to repeats and number. I switched to @jpbrodrick89's suggestion of not using the pedantic option.

When saving the result, more than just the minimum gets saved - a bunch of statistics get written to an output .json, and these can be selected among during analysis. There is also an option to save full results if desired.

jpbrodrick89 · 2025-06-20T13:17:58Z

.github/workflows/codspeed.yml

+
+#       - name: Run benchmarks
+#         uses: CodSpeedHQ/action@v3
+#         with:


This is where you can add the env variables

Suggested change

# with:

# with:

# env:

# EQX_ON_ERROR: "nan"

johannahaffner · 2025-06-20T18:29:19Z

Summary of changes

Streamlined the workflow

We're now exclusively using built-in flags and defaulting to no benchmark runs (pytest --benchmark-only is now all it takes to run the benchmarks - this overrides the default). So we're forward compatible with everything and the syntax is clearer as well
Regular pytest flags, like -k test_something.py and -s work with pytest-benchmark too
We now have a single conftest, and the benchmarks have been moved back to the tests folder. EQX_ON_ERROR is set for the relevant tests via the monkeypatch fixture. Is that enough to make equinox change its behaviour for the benchmark tests?
Documentation has been merged, we now have a new section in CONTRIBUTING.md about benchmarking, and the old benchmarks/README.md has been removed.

Addressed various other points

pandas is removed
fire is introduced as a dependency (thanks for the suggestion)
I'm still only blocking the value of the objective - we had decided against the tree-mapping overhead that would be incurred by blocking the solution as a whole earlier (here).
We're now using default numbers of rounds and iterations.
I opted against using taskset, because I don't really understand what it would/should do

No codspeed
I started experimenting with codspeed and after a call with @jpbrodrick89 decided against including it. Essentially it is not very JAX-compatible (yet) and the output is not that informative.

So for now we have benchmarks that can be used during development to get feedback on how our solvers are doing, and whose output can inform discussions. This is a step up from the microbenchmarks we otherwise default too, and we can still move to a workflow-based option in the future, if codspeed issues can be resolved or if we find a better alternative.

johannahaffner · 2025-06-20T18:40:28Z

Oh and I have a dirty workaround for now: dtype and rank promotion are not in strict mode and do not raise, because I need to fix something upstream first. I will make sure sif2jax is compatible with this!

pfackeldey · 2025-06-20T19:01:33Z

I opted against using taskset, because I don't really understand what it would/should do

Fyi, taskset lets you restrict the number of cores seen by a process. E.g. taskset -c 1 python jax_code.py would only use 1 core and not however many are available on the machine. IIRC jax is rather aggressive about this and uses whatever is available.

I assume this only makes sense if you are interested in having a history of benchmarks over multiple CI runs, and given the premise that Github Actions runner may have different number of cores. (I'm not sure how homogeneous GitHub actions are?).

(This only works on linux, and if the cores differ in clock frequency between different workers, that also doesn't help of course)

johannahaffner · 2025-06-20T21:22:50Z

I assume this only makes sense if you are interested in having a history of benchmarks over multiple CI runs, and given the premise that Github Actions runner may have different number of cores. (I'm not sure how homogeneous GitHub actions are?).

(This only works on linux, and if the cores differ in clock frequency between different workers, that also doesn't help of course)

Thank you for the explanation! Does this mean that you can get around some of the variability in CI runs? Apparently how fast things are in there depends on a lot of factors. That is actually what codspeed advertises - they claim that they get the noise down from around 30 % to 1 %; and they do that through some clever isolation of the bit to be benchmarked, which is then run just once. But this does, apparently, not really mesh well with JAX.

FWIW I've come to the conclusion that the pragmatic way forward is to just use the benchmarks locally during development.

tests/test_benchmarks.py

jpbrodrick89 · 2025-06-20T22:24:12Z

Yes, good chat @johannahaffner! To summarise, basically CodSpeed "instrumentation" is a dead end and CodSpeed "wall time" can easily eat up minutes if you're not careful. I'm playing with it in our repo and will see if I can share any further feedback somehow when I've tried a bit more customisation.

I'm not sure whether setting the environment variable using monkeypatch will work, it all depends on whether pytest reloads equinox for each test after this has been set. I would have assumed not, but pytest is still kind of magic to me so you never know! I guess you can come up with a test to make sure it works? Otherwise, yes, just

With regards to threads and taskset, it might be interesting to do separate runs trying to disable threading or not, but in practice users will let Jax use the cores it can so letting it use more than one (maybe set an upper limit) might be more realistic. Standard runners always have 4 cores so this would seem a reasonable upper bound in case you somehow get thrown onto a better one. Clock speed and traffic variation is much more of a problem.

Nice work on all this!

…luation of Optimistix' solvers.

johannahaffner · 2025-06-22T15:17:00Z

I think this is ready! Thanks to all of you for your help, @patrick-kidger @jpbrodrick89 and @pfackeldey.

Small updates made since last synopsis:

accepted @jpbrodrick89's suggestion to use device put
added a comment to use task set in run_tests.yml - for future reference, once this becomes part of the CI
bumped the minimum version of sif2jax to one that is compatible with strict dtype and rank promotion rules
moved the benchmarking script back to the benchmarks folder, with its own conftest - this is to make sure that we have a separate benchmarking environment, which now uses EQX_ON_ERROR = "nan" and does not monkeypatch anything.
updated CONTRIBUTING.md to match.

jpbrodrick89 · 2025-06-22T15:29:28Z

Exciting times @johannahaffner! 🚀 It looks like you set eqx on error as a global python variable rather than an os environment variable, see here for different options for setting env variables in pytest https://stackoverflow.com/questions/36141024/how-to-pass-environment-variables-to-pytest

The way I do it is just to set on command line with EQX_ON_ERROR=nan pytest....

johannahaffner · 2025-06-22T16:34:50Z

Thank you! I'm quite unfamiliar with this one. I switched to using os.environ to avoid long commands.

jpbrodrick89 · 2025-06-22T17:30:31Z

Awesome! Not at the desk right now to to check, but i think you should be able to get rid of the noqa e402s now as os is meant to be ignored (https://docs.astral.sh/ruff/rules/module-import-not-at-top-of-file/) I'm not sure about the i001would be nice if this had a similar exception but it's not documented.

johannahaffner · 2025-06-22T18:47:28Z

Awesome! Not at the desk right now to to check, but i think you should be able to get rid of the noqa e402s now as os is meant to be ignored (https://docs.astral.sh/ruff/rules/module-import-not-at-top-of-file/) I'm not sure about the i001would be nice if this had a similar exception but it's not documented.

Indeed! And I001 does not make exceptions.

johannahaffner · 2025-06-22T18:57:24Z

I tried some benchmarking on our brand new L-BFGS and the current setup, using jax.jit exclusively, complains about the jaxpr in its state. I'll see if I can figure out a way around this that also works for compilation benchmarking.

patrick-kidger reviewed Apr 21, 2025

View reviewed changes

tests/conftest.py Outdated Show resolved Hide resolved

benchmarks/test_benchmarks.py Outdated Show resolved Hide resolved

benchmarks/requirements.txt Outdated Show resolved Hide resolved

johannahaffner force-pushed the pytest-benchmark branch from 5dbcf66 to f3003e8 Compare April 29, 2025 07:18

patrick-kidger reviewed Apr 30, 2025

View reviewed changes

patrick-kidger reviewed May 3, 2025

View reviewed changes

johannahaffner mentioned this pull request May 12, 2025

Port this into Optimistix proper? packquickly/optpile#10

Open

johannahaffner marked this pull request as draft June 10, 2025 19:24

johannahaffner marked this pull request as ready for review June 16, 2025 00:33

patrick-kidger reviewed Jun 19, 2025

View reviewed changes

johannahaffner commented Jun 20, 2025

View reviewed changes

jpbrodrick89 reviewed Jun 20, 2025

View reviewed changes

tests/test_benchmarks.py Outdated Show resolved Hide resolved

johannahaffner force-pushed the pytest-benchmark branch 2 times, most recently from 983d24d to c88ab03 Compare June 22, 2025 14:08

Implement pytest-benchmark based setup for systematic performance eva…

640b26a

…luation of Optimistix' solvers.

johannahaffner force-pushed the pytest-benchmark branch from ebc5896 to 640b26a Compare June 22, 2025 14:42

Johanna Haffner added 3 commits June 22, 2025 17:08

version bump for sif2jax requirements

224b5e9

add semi-recent matplotlib version to specify a minimum

ddaf0c2

no more monkeypatching

801e4a7

set EQX_ON_ERROR with os.environ

08fb2ae

johannahaffner mentioned this pull request Aug 23, 2025

Tests: benchmarks patrick-kidger/quax#69

Open

-#         with:
+#         with:
+#           env:
+#             EQX_ON_ERROR: "nan"

Proposed structure for integration of benchmarks with pytest-benchmark #128

Are you sure you want to change the base?

Proposed structure for integration of benchmarks with pytest-benchmark #128

Uh oh!

Conversation

johannahaffner commented Apr 16, 2025

Uh oh!

patrick-kidger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

johannahaffner commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrick-kidger left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

johannahaffner commented Apr 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pfackeldey commented May 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johannahaffner commented May 4, 2025

Uh oh!

johannahaffner commented Jun 10, 2025

Uh oh!

johannahaffner commented Jun 16, 2025

Uh oh!

johannahaffner commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrick-kidger commented Jun 19, 2025

Uh oh!

patrick-kidger left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

johannahaffner commented Apr 29, 2025 •

edited

Loading

pfackeldey commented May 4, 2025 •

edited

Loading

johannahaffner commented Jun 16, 2025 •

edited

Loading