Skip to content

Conversation

johannahaffner
Copy link
Collaborator

This PR is a proposal for what integration of benchmarks could look like.

I suggest:

  • keeping a separate benchmark tests
  • running these with the usual familiar testing machinery
  • keeping them out of the normal CI, at least once the benchmark collection is large enough to make inclusion impractical
  • including extra information, such as the number of steps, the result of a nonlinear solve, and so on, which is useful for comparison to other implementations
  • benchmarking results can be saved to a JSON file, which enables comparison across versions (check if we still get the same accuracy, if we now need more steps, and so on)

I have not included any compile time benchmarking yet, but I think this thing here can happily live next to differently flavoured pytest-codspeed benchmarks, such as discussed in patrick-kidger/equinox#1001

I'm parsing a collection of CUTEST problems for now, and once you get rid of all the FORTRAN stuff and weird formatting these don't take up that much space and can either live here, or perhaps elsewhere if that is more practical.

Copy link
Owner

@patrick-kidger patrick-kidger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the basic structure of this looks good to me! I like being able to integrate this into pytest.

@johannahaffner
Copy link
Collaborator Author

johannahaffner commented Apr 29, 2025

skipif works, but then the benchmark folder has to be in tests, since otherwise we cannot run pytest benchmarks separately. In that case it won't look in the tests folder, where the extra option is configured. I think it's fine to have the benchmarks folder be a sub-directory of tests, so I moved it. (Running just the benchmarks is now possible with pytest tests/benchmarks.)

I also added an abstract base class for benchmark problems, and implemented this for our very favourite Rosenbrock function.

Copy link
Owner

@patrick-kidger patrick-kidger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this would work if it was defined in a benchmarks/conftest.py (outside of tests)?

CONTRIBUTING.md Outdated
benchmark tests with

```bash
pytest --extensive
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe --benchmark?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is to much confusion with the built-in flags then? We already have --benchmark-save, which is built-in and specifies where to save the results in benchmark format, and we also use the built-in benchmarking fixture with @pytest.mark.benchmark.

I think that means that our own decorator and flags should be different! Maybe optx_benchmarks?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or @cutest and --cutest? This will work fine until we integrate other benchmarking suites.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're anticipating a need to separate out solver benchmarks (cutest) vs compile-time benchmarks?

Anyway given that this is now in a separate directory, I'm realising that I think these will be ignored by pytest by default, and only ran if you're in the directory or if you run pytest benchmarks from the top-level, so perhaps we don't need a flag here at all?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they're not ignored by default, even if they are in a separate directory.

They would be ignored if the names did not start with test_, but then we'd have the reverse problem.

I think runtime and compile time benchmarks can be run with the same problem / solver setup, but the benchmark fixture stores all the results and I think things might get muddled there? But happy to try.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick look at the structure of the .json output, and it seems wiser to have two different test functions for compile and runtime. Keeps things neater.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay! So in that case maybe --compile-benchmarks and --runtime-benchmarks?

Or compose with existing pytest functionality, pytest --benchmark -k compile vs pytest --benchmark -k runtime?

Mostly I'd just like to have the flag make it clear that this isn't about running slow tests or something.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now use these built in flags and skip by default. Regular -k flags also work.
I moved the benchmarking script into the regular tests folder again, since it is now quite small.

This makes the setup less complex, in particular we are no longer dealing with two different conftest.py files, I had noticed that running with pytest benchmarks/ was actually needed to pull in the custom flags (e.g. --max-dimension). Now there is a single flag to run the benchmarks, and custom ones for subsets.

@johannahaffner
Copy link
Collaborator Author

I'm now going with @cutest and benchmarks now have their own configuration. Both might be useful, since we might want to run different benchmarking suites (once we get there!) independently of one another, if they are geared toward different problems and require different solvers.

And we might want to tweak the configuration - to check how well we do in 32-bit, for instance - without affecting the main test suite.

CONTRIBUTING.md Outdated
benchmark tests with

```bash
pytest --extensive
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're anticipating a need to separate out solver benchmarks (cutest) vs compile-time benchmarks?

Anyway given that this is now in a separate directory, I'm realising that I think these will be ignored by pytest by default, and only ran if you're in the directory or if you run pytest benchmarks from the top-level, so perhaps we don't need a flag here at all?

Comment on lines 12 to 15
def block_tree_until_ready(x):
dynamic, static = eqx.partition(x, eqx.is_inexact_array)
dynamic = jtu.tree_map(lambda x: x.block_until_ready(), dynamic)
return eqx.combine(dynamic, static)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW the partition/combine operations will add a small but measurable amount of overhead, as this happens outside JIT.

I'd be tempted to suggest instead that outputs should be required to always only be JAX arrays, and to then call jax.block_until_ready.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could return the value of the objective function at the solution and use that to determine if the problem has been solved to some degree of accuracy.

This is all that is available for 90+ % of CUTEST problems anyway.

More detailed metrics such as the number of iterations could be analysed separately, but we'd lose access to them here.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think these are separate concerns? I'm not considering convergence here, just a wallclock overhead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah you mean only returning the dynamic part of the solution? I understood this as doing something like return solution.state.f_info.f or something!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we need at least solution.RESULTS which does not support block_until_ready. To accommodate this, I wrote a minimal wrapper that calls the compiled function, calls block_until_ready on the objective value and returns some extra stuff as well.

Since the function does not actually do anything else, the overhead should be minimal. I don't know how it compares to a tree-mapped block_until_ready, though. If we really want to trim the microseconds then we could use success = solution.result == optx.RESULTS.successful, which is an array.

I suggest we do that when benchmarking for publication, but otherwise it seems overkill - the usage of the benchmarking set is to catch regressions and test performance of new features, and in this case I think it is nice to have a .json file from which we can read informative results, not just minimal timing results.

@pfackeldey
Copy link

pfackeldey commented May 4, 2025

Hi @johannahaffner,
not sure if these thoughts are useful for you here, but:

  • I think Github actions workers are not necessarily very homogeneous afaik. So it might be that you get different runtime values if you happen to land on a different worker. Of course, it's always better to have a server that you can use with a reliable small uncertainty on the runtime.
  • you might want to restrict JAX to use a single thread only. I ran some benchmarks in the past and found it hard to restrict this (maybe it's better by now?). At least on linux you can use taskset -c 0 pytest --cutest as a "solution".

@johannahaffner
Copy link
Collaborator Author

They are! Thank you :)

Github actions workers are not necessarily very homogeneous

I was completely unaware of this! But that makes sense. I think depending on the size this benchmark set grows to (fingers crossed), we might not want to integrate it into the CI, but actually run it locally and add some scripts to help analyse the results. So this is another point in favour of maybe not doing this as part of our CI.

you might want to restrict JAX to use a single thread only.
And good point! We should take this into account when comparing results to other implementations.

@johannahaffner
Copy link
Collaborator Author

I'm introducing a benchmark-time dependency on sif2jax here, which is a current WIP of mine aiming to (have LLMs) port the complete CUTEst collection to clean, human-readable and functionally pure JAX implementations.

I'll mark this as ready for review again when we have done further verification of the benchmark problems; currently about 247 benchmark problems pass tests against the Fortran implementation.

@johannahaffner
Copy link
Collaborator Author

  • you might want to restrict JAX to use a single thread only. I ran some benchmarks in the past and found it hard to restrict this (maybe it's better by now?). At least on linux you can use taskset -c 0 pytest --cutest as a "solution".

This does seem to have improved? I've left this in as a commented-out suggestion, but I'm not quite sure what specific behaviour it is meant to prevent if enabled, @pfackeldey.

@johannahaffner
Copy link
Collaborator Author

johannahaffner commented Jun 16, 2025

So I tested this a bit more rigorously and it seems like most things are working on our end!

What we have right now:

  • the ability to run benchmarks on CUTEst problems provided by sif2jax, this introduces a benchmark-time dependency on a project of mine
  • selectively run on problems up to a certain size
  • compare against scipy for specific, defined solvers (other implementations, e.g. in JAXopt, optax, tensorflow can be added following the same pattern)
  • analyse benchmark results - all the pytest-benchmark features, e.g. for comparison across runs, saved results with branch, commit info, time stamp, worktree status (dirty / clean - only clean guarantees reproducibility)
  • custom analysis: performance profiles of select solvers

I updated the README to reflect this.

Things that require a bit of thought still:

  • how to specify the solvers we're comparing against - solver tolerances do not necessarily have the same meaning in different libraries
  • what additional metadata to store and to assess to check if solver comparisons are truly fair
  • things like maybe allowing solver-specific maximum numbers of steps
  • how/if to put benchmark results under version control so contributors can compare against each other's runs (pytest-benchmark saves these in folders that are specific to the platform (Darwin, Linux, ...), the Python implementation (CPython, ...), the Python version and the precision used, so no risk of mixing these up). Currently the benchmarking results are git ignored.
  • I think it would be nice to ask Contributors that add a new solver if they could specify alternative implementations to benchmark against, since they are probably best positioned to judge the finer details of this -> this can go into CONTRIBUTING.md. Or does that make things sound too scary?
  • if we're happy with the structure, then compile-time benchmarks can be added too.

I'll mark this as ready to review for now :)

@johannahaffner johannahaffner marked this pull request as ready for review June 16, 2025 00:33
@patrick-kidger
Copy link
Owner

how to specify the solvers we're comparing against - solver tolerances do not necessarily have the same meaning in different libraries

Typically this is handled with a work-precision diagram.

how/if to put benchmark results under version control so contributors can compare against each other's runs (pytest-benchmark saves these in folders that are specific to the platform (Darwin, Linux, ...), the Python implementation (CPython, ...), the Python version and the precision used, so no risk of mixing these up). Currently the benchmarking results are git ignored.

Interesting idea! This strikes me as the kind of thing we should log somewhere, just not in version control.

I think it would be nice to ask Contributors that add a new solver if they could specify alternative implementations to benchmark against, since they are probably best positioned to judge the finer details of this -> this can go into CONTRIBUTING.md. Or does that make things sound too scary?

I think that sounds like a reasonable ask! Realistically if someone is going to the effort of contributing a solver then they're already in pretty deep, and probably have opinions about the existing options available :D

if we're happy with the structure, then compile-time benchmarks can be added too.

sg!

Copy link
Owner

@patrick-kidger patrick-kidger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the look of this.

CONTRIBUTING.md Outdated
benchmark tests with

```bash
pytest --extensive
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay! So in that case maybe --compile-benchmarks and --runtime-benchmarks?

Or compose with existing pytest functionality, pytest --benchmark -k compile vs pytest --benchmark -k runtime?

Mostly I'd just like to have the flag make it clear that this isn't about running slow tests or something.

For reproducibility, make sure that all your changes have been committed and your working tree is clean before running the benchmarks. `pytest-benchmark` will otherwise mark your benchmarking results as `dirty`.
Saved results include the commit, branch, version, and an exact timestamp by default.

Note that benchmarks are run with `throw=False` enabled, since otherwise no result is written in the json file, but we do want to know if we failed to solve a problem.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wanted we could/should set EQX_ON_ERROR=nan here too. This is essentially equivalent to the above and is also a performance improvement.

(This has to happen before Equinox is imported.)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is essential for vjp as throw=False does nothing there I believe

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh I see that I might have boxed myself a little into a corner here - I could simplify the workflow and put the benchmarks back in the tests folder. This way we only get one conftest and one setup to deal with, and there are no stupid foot guns such as custom flags only being available if we're also specifying the folder name: e.g.

pytest benchmarks/ --benchmark-only--max-dimension=10

would work, but

pytest --benchmark-only --max-dimension=10

would not, which I had not previously realised.

But this also means that the environment variable applies to the whole test session, right? And while we want it for benchmarking, we don't want it for testing.

Can I force a fresh environment for a specific testing module? Alternatively it might make sense to modify --benchmark-only so that if present, we do set the environment variable and not otherwise.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I would use a new environment for benchmarks (I find this hepful in my workflow as tests run faster than benchmarks and I can fail fast if something is wrong). Then no need to fiddle around with environment variables in tests/ or benchmarks/ and they are just set in the workflow actions (or by user). This is also helpful as it allows the user to run with different setting to compare impact.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Github workflows? That makes sense! Locally we default to no benchmark runs, and I just discovered that I can monkeypatch the environment variables for specific tests only - would that work, @patrick-kidger?


def wrapped(y0):
solution = solve(y0)
objective_value = solution.state.f_info.f.block_until_ready()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just jax.block_until_ready(solution)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would add the overhead of a tree_map and for that reason we had decided not to do that. Background here :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

return objective_value, solution.result, num_steps

# Benchmark the runtime of the compiled function
values = benchmark.pedantic(wrapped, args=(problem.y0(),), rounds=5, iterations=1)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the benchmarking work here btw? As compared to e.g. min(timeit.repeat(..., number=..., repeat=...)) or the ipython %%timeit? (E.g. the former uses minimisation to handle one-sided noise, whilst the latter automatically figures out how many runs to perform.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So rounds and iterations correspond to repeats and number. I switched to @jpbrodrick89's suggestion of not using the pedantic option.

When saving the result, more than just the minimum gets saved - a bunch of statistics get written to an output .json, and these can be selected among during analysis. There is also an option to save full results if desired.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically 5 rounds is very small compared to defaults and will be very susceptible to noise but depends how expensive the calculation is. I typically trust the standard non pedantic option.

@jpbrodrick89
Copy link

Hi guys, I will chime in a tiny bit more detail this afternoon as I've been trying a lot of this on one of my repos. Unfortunately CodSpeeds usual instrumentation provides no value for Jax functions as everything counts as a system calls which is not trackable by valgrind. Therefore to use their tooling you have to use their wall runner where costs can mount up pretty quickly. I had 360 micro benchmarks and used up the free 2 hours in just four commits. Also, even on their bare metal wall runner there is still some noise probably due the time of actually retrieving the function from the jax cache rather than running the actual floap. Trying to work out if I can make this more useable next week. At least they've updated their docs in the last couple weeks quite substantially.
@johannahaffner happy to have a call if helpful.

@johannahaffner
Copy link
Collaborator Author

@johannahaffner happy to have a call if helpful.

A call would be great! I was just getting started experimenting with it (apologies for spamming your inbox with workflow failures, @patrick-kidger). In theory it seems like a nice setup!

Copy link
Collaborator Author

@johannahaffner johannahaffner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed most comments, rest tonight :) Thank you both for your input!

CONTRIBUTING.md Outdated
benchmark tests with

```bash
pytest --extensive
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now use these built in flags and skip by default. Regular -k flags also work.
I moved the benchmarking script into the regular tests folder again, since it is now quite small.

This makes the setup less complex, in particular we are no longer dealing with two different conftest.py files, I had noticed that running with pytest benchmarks/ was actually needed to pull in the custom flags (e.g. --max-dimension). Now there is a single flag to run the benchmarks, and custom ones for subsets.

jax.config.update("jax_enable_x64", True)
jax.config.update("jax_numpy_rank_promotion", "raise")
jax.config.update("jax_numpy_dtype_promotion", "standard")
# Remark Peter: limit JAX to single-thread with taskset -c 0 pytest --cutest
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it what this is meant to prevent/accomplish, if we added it.


def wrapped(y0):
solution = solve(y0)
objective_value = solution.state.f_info.f.block_until_ready()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would add the overhead of a tree_map and for that reason we had decided not to do that. Background here :)

return objective_value, solution.result, num_steps

# Benchmark the runtime of the compiled function
values = benchmark.pedantic(wrapped, args=(problem.y0(),), rounds=5, iterations=1)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So rounds and iterations correspond to repeats and number. I switched to @jpbrodrick89's suggestion of not using the pedantic option.

When saving the result, more than just the minimum gets saved - a bunch of statistics get written to an output .json, and these can be selected among during analysis. There is also an option to save full results if desired.


# - name: Run benchmarks
# uses: CodSpeedHQ/action@v3
# with:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where you can add the env variables

Suggested change
# with:
# with:
# env:
# EQX_ON_ERROR: "nan"

@johannahaffner
Copy link
Collaborator Author

johannahaffner commented Jun 20, 2025

Summary of changes

Streamlined the workflow

  • We're now exclusively using built-in flags and defaulting to no benchmark runs (pytest --benchmark-only is now all it takes to run the benchmarks - this overrides the default). So we're forward compatible with everything and the syntax is clearer as well
  • Regular pytest flags, like -k test_something.py and -s work with pytest-benchmark too
  • We now have a single conftest, and the benchmarks have been moved back to the tests folder. EQX_ON_ERROR is set for the relevant tests via the monkeypatch fixture. Is that enough to make equinox change its behaviour for the benchmark tests?
  • Documentation has been merged, we now have a new section in CONTRIBUTING.md about benchmarking, and the old benchmarks/README.md has been removed.

Addressed various other points

  • pandas is removed
  • fire is introduced as a dependency (thanks for the suggestion)
  • I'm still only blocking the value of the objective - we had decided against the tree-mapping overhead that would be incurred by blocking the solution as a whole earlier (here).
  • We're now using default numbers of rounds and iterations.
  • I opted against using taskset, because I don't really understand what it would/should do

No codspeed
I started experimenting with codspeed and after a call with @jpbrodrick89 decided against including it. Essentially it is not very JAX-compatible (yet) and the output is not that informative.

So for now we have benchmarks that can be used during development to get feedback on how our solvers are doing, and whose output can inform discussions. This is a step up from the microbenchmarks we otherwise default too, and we can still move to a workflow-based option in the future, if codspeed issues can be resolved or if we find a better alternative.

@johannahaffner
Copy link
Collaborator Author

Oh and I have a dirty workaround for now: dtype and rank promotion are not in strict mode and do not raise, because I need to fix something upstream first. I will make sure sif2jax is compatible with this!

@pfackeldey
Copy link

pfackeldey commented Jun 20, 2025

I opted against using taskset, because I don't really understand what it would/should do

Fyi, taskset lets you restrict the number of cores seen by a process. E.g. taskset -c 1 python jax_code.py would only use 1 core and not however many are available on the machine. IIRC jax is rather aggressive about this and uses whatever is available.

I assume this only makes sense if you are interested in having a history of benchmarks over multiple CI runs, and given the premise that Github Actions runner may have different number of cores. (I'm not sure how homogeneous GitHub actions are?).

(This only works on linux, and if the cores differ in clock frequency between different workers, that also doesn't help of course)

@johannahaffner
Copy link
Collaborator Author

I assume this only makes sense if you are interested in having a history of benchmarks over multiple CI runs, and given the premise that Github Actions runner may have different number of cores. (I'm not sure how homogeneous GitHub actions are?).

(This only works on linux, and if the cores differ in clock frequency between different workers, that also doesn't help of course)

Thank you for the explanation! Does this mean that you can get around some of the variability in CI runs? Apparently how fast things are in there depends on a lot of factors. That is actually what codspeed advertises - they claim that they get the noise down from around 30 % to 1 %; and they do that through some clever isolation of the bit to be benchmarked, which is then run just once. But this does, apparently, not really mesh well with JAX.

FWIW I've come to the conclusion that the pragmatic way forward is to just use the benchmarks locally during development.

@jpbrodrick89
Copy link

Yes, good chat @johannahaffner! To summarise, basically CodSpeed "instrumentation" is a dead end and CodSpeed "wall time" can easily eat up minutes if you're not careful. I'm playing with it in our repo and will see if I can share any further feedback somehow when I've tried a bit more customisation.

I'm not sure whether setting the environment variable using monkeypatch will work, it all depends on whether pytest reloads equinox for each test after this has been set. I would have assumed not, but pytest is still kind of magic to me so you never know! I guess you can come up with a test to make sure it works? Otherwise, yes, just

With regards to threads and taskset, it might be interesting to do separate runs trying to disable threading or not, but in practice users will let Jax use the cores it can so letting it use more than one (maybe set an upper limit) might be more realistic. Standard runners always have 4 cores so this would seem a reasonable upper bound in case you somehow get thrown onto a better one. Clock speed and traffic variation is much more of a problem.

Nice work on all this!

@johannahaffner johannahaffner force-pushed the pytest-benchmark branch 2 times, most recently from 983d24d to c88ab03 Compare June 22, 2025 14:08
@johannahaffner
Copy link
Collaborator Author

I think this is ready! Thanks to all of you for your help, @patrick-kidger @jpbrodrick89 and @pfackeldey.

Small updates made since last synopsis:

  • accepted @jpbrodrick89's suggestion to use device put
  • added a comment to use task set in run_tests.yml - for future reference, once this becomes part of the CI
  • bumped the minimum version of sif2jax to one that is compatible with strict dtype and rank promotion rules
  • moved the benchmarking script back to the benchmarks folder, with its own conftest - this is to make sure that we have a separate benchmarking environment, which now uses EQX_ON_ERROR = "nan" and does not monkeypatch anything.
  • updated CONTRIBUTING.md to match.

@jpbrodrick89
Copy link

jpbrodrick89 commented Jun 22, 2025

Exciting times @johannahaffner! 🚀 It looks like you set eqx on error as a global python variable rather than an os environment variable, see here for different options for setting env variables in pytest https://stackoverflow.com/questions/36141024/how-to-pass-environment-variables-to-pytest

The way I do it is just to set on command line with EQX_ON_ERROR=nan pytest....

@johannahaffner
Copy link
Collaborator Author

Thank you! I'm quite unfamiliar with this one. I switched to using os.environ to avoid long commands.

@jpbrodrick89
Copy link

Awesome! Not at the desk right now to to check, but i think you should be able to get rid of the noqa e402s now as os is meant to be ignored (https://docs.astral.sh/ruff/rules/module-import-not-at-top-of-file/) I'm not sure about the i001would be nice if this had a similar exception but it's not documented.

@johannahaffner
Copy link
Collaborator Author

Awesome! Not at the desk right now to to check, but i think you should be able to get rid of the noqa e402s now as os is meant to be ignored (https://docs.astral.sh/ruff/rules/module-import-not-at-top-of-file/) I'm not sure about the i001would be nice if this had a similar exception but it's not documented.

Indeed! And I001 does not make exceptions.

@johannahaffner
Copy link
Collaborator Author

I tried some benchmarking on our brand new L-BFGS and the current setup, using jax.jit exclusively, complains about the jaxpr in its state. I'll see if I can figure out a way around this that also works for compilation benchmarking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants