Unified solver interface for compatibility with JAXopt and Optimistix #365

bagibence · 2025-07-08T14:53:26Z

Because JAXopt is no longer maintained, NeMoS is migrating its optimization backed to Optimistix and Optax.
As these are not a full replacement for JAXopt yet, at least in the beginning, solvers from both backends will be supported.
This is achieved through a unified interface for solvers which defines the interaction between the optimization backend's solvers and NeMoS models.

The solver interface each solver must adhere to in order to be compatible with BaseRegressor is defined in AbstractSolver, and mostly follows the previous interface of JAXopt solvers.
Compatibility with existing JAXopt-, Optimistix-, and the (Prox-)SVRG solvers is provided by adapter classes. A base class for these is defined in SolverAdapter.

Instead of looking up solver classes based on their name in nemos.solvers and jaxopt, the class used to implement each algorithm is explicitly defined in the solver registry.
Currently solvers are created based on the algorithm's name, but this could be extended in the future to allow user-defined solvers to be passed to BaseRegressor. As long as a solver implements the AbstractSolver interface, it should be compatible with NeMoS. (In this case the compatibility check with the regularizer could be disabled or rewritten.)

As Optimistix doesn't implement ProximalGradient, this is currently set to use a custom implementation in OptaxOptimistixProximalGradient based on Optax's SGD with linesearch followed by the proximal operator. This seems to work in practice, but is not as theoretically sound as jaxopt.ProximalGradient.
We contributed an implementation of LBFGS to Optimistix (PR for L-BFGS update, PR for zoom linesearch), and a wrapper class for it is already included here (just commented out). Until these PRs are merged and released, JAXopt's or Optax's implementation could be used as default.

As in this interface every solver parameter must be passed to the solver's constructor, BaseRegressor.instantiate_solver is simplified. Each solver can expose its accepted arguments explicitly through get_accepted_arguments. If this is not defined, BaseRegressor tries to infer it.

There are basic differences between the parameters accepted by JAXopt and Optimistix. These differences are handled on solver instantiation, but a warning is raised:

JAXopt uses maxiter, Optimistix uses max_steps
JAXopt uses a single tol value, while Optimistix uses atol and rtol

Currently the NEMOS_SOLVER_BACKEND environment variable can be used to run tests with a specific backend. I added tox environments that run solver-dependent tests with both backends, and added these to the CI.
This switch could be moved to the main code, so that users have the option to choose the backend they want.

BalzaniEdoardo

Big picture questions:

Do we need JaxOpt still? it looks like the transition would be completed already with this PR
How does this change how the user interacts with solvers? It looks like it doesn't. Is there a reason why the solver registry is public if one can only pass strings to specify solvers?
Does this interface allow arbitrary optimistix and optax solvers? i.e. can we eventually allow passing an instance of a solver directly
For this PR we need some developer guide note explaining the interface? stuff like, do we expect any solver implementing the API to be compatible? do you have any note on how to write an interface to a new solver library? other than matching the api
Can you implement the proximal gradient in a separate PR?
If we are not dropping jaxopt yet, then we could split the optax/optimistix in two separate PRs and add jaxopt to the registry instead of what we have now. But if you think that we can drop jaxopt here, then let's continue on this one.

Next week Billy is joining and we continue discuss this

BalzaniEdoardo · 2025-07-11T18:34:09Z

src/nemos/solvers/_abstract_solver.py

+        pass
+
+    @abc.abstractmethod
+    def run(self, init_params: Params, *args) -> StepResult:


maybe have an abstract run_iterator, taking in a generator; for solvers that are not stochastic raise a ValueError or a more specific one if it exists.

Yes, that's great! I forgot about that.

I added it to AbstractSolver.
How do you want to control which solver should have it and which shouldn't?
How about adding a base implementation copying jaxopt.StochasticSolver's and only allowing if the class has a class variable _stochastic=True?
Or a StochasticSolverMixin with the implementation and inheriting that in every class that should have it?

src/nemos/solvers/_abstract_solver.py

BalzaniEdoardo · 2025-07-11T18:59:08Z

src/nemos/solvers/_jaxopt_solvers.py

+            **solver_init_kwargs,
+        )
+
+    def _extend_args(self, args):


can we avoid the if statement by storing the extended args? or initializing as (reg_stregth,) or empty tuple.

Why do you want to avoid it? args are passed to the methods by the code using the solver, they can't be stored on construction.
The prepending could be done it other ways, I like that this is explicit.

src/nemos/solvers/_optimistix_solvers.py

BalzaniEdoardo · 2025-07-11T19:27:49Z

src/nemos/solvers/_optimistix_solvers.py

+            tags=self.config.tags,
+        )
+
+        self.stats.update(solution.stats)


why do we have a self.stats and the other adapter doesn't. can we replicate the stats in the jaxopt adapter too? you can parse the jaxopt state and extract as much info as we can

It's only because Optimistix saves the number of steps taken at the end of a minimise run in the stats instead of the state itself like JAXopt, and I wanted a quick way to access those for the tests.

I haven't given this too much thought, so we could think about if we want to standardize what we keep at the end of the runs.

src/nemos/solvers/_optimistix_solvers.py

src/nemos/solvers/_direct_optimistix_prox_grad.py

src/nemos/solvers/_jaxopt_solvers.py

bagibence · 2025-07-14T13:15:09Z

Thanks for taking a look!

Do we need JaxOpt still? it looks like the transition would be completed already with this PR.

I think it's good to keep JAXopt at least as a backup option for a while. Perhaps providing a way like the environment variable mentioned for switching the backend.

How does this change how the user interacts with solvers? It looks like it doesn't.

It doesn't. I aimed for no changes required from the user's side as to not break existing analysis code.

Is there a reason why the solver registry is public if one can only pass strings to specify solvers?

Do you mean that it's not called _solver_registry?
Actually, overwriting the registry dict already gives users a way to plug in their own classes. I'm not sure if that's a feature or a bug.

Does this interface allow arbitrary optimistix and optax solvers? i.e. can we eventually allow passing an instance of a solver directly

Do you mean passing an optimistix.GradientDescent directly? It doesn't allow that immediately, but I think it could be easily be modified (in BaseRegressor) to allow passing instances of anything that implements the solver interface. So for example GLM(solver=OptimistixGradientDescent(learning_rate=1e-3)). Also, OptimistixWrapper could be modified to, instead of constructing _solver, accept an already instantiated solver object. Or just have a function that wraps an existing solver instance.

For this PR we need some developer guide note explaining the interface? stuff like, do we expect any solver implementing the API to be compatible? do you have any note on how to write an interface to a new solver library? other than matching the api

Yes, that makes sense. I would expect any solver implementing the API to be compatible. The only hurdle I see is the compatibility check in the regularizers that works based on name.
I think for a new library subclassing SolverAdapter would be the most sane approach. So e.g. for Optax an alternative instead of relying Optimistix's OptaxMinimiser wrapper could be to have an OptaxWrapper whose run method implements the while loop like done here.

Can you implement the proximal gradient in a separate PR?

Sure!

If we are not dropping jaxopt yet, then we could split the optax/optimistix in two separate PRs and add jaxopt to the registry instead of what we have now. But if you think that we can drop jaxopt here, then let's continue on this one.

Yes, that also makes sense. I think it's useful to keep JAXopt for now and only drop it once the switch is done and you are satisfied with it.

Edit: quote formatting was off

Defines the interface. Will add type annotations later.

Instead of trying to find solvers in packages, have an explicit list.

According to the new interface __init__ will receive everything, and will raise an error at that point.

Simplify how the solver object is accessed in test_solvers and test_glm.

Cleanup for previous commit

This will go into a separate issue + PR. Updated the developer notes.

bagibence · 2025-08-07T09:53:56Z

I removed things related to stochastic optimization, this will be done later in #376

Currently using PyTree from jaxtyping. Might want to just alias to Any. Moved solver interface types to typing. Use StepResult and SolverState in GLM and BaseRegressor.

Also a docstring

For now PyTree is imported as Pytree to not introduce too many changes. This PyTree is parametrizable, which is used in the solvers module. ArrayLike is the same as jax.typing.ArrayLike. Importing from jaxtyping for consistency.

I will open an issue about potentially using jaxtyping.

There is a scratch part at the end that will have to be removed based on which version we think is better

BalzaniEdoardo

Let's have @billbrod taking a look but after you address my latest comments, this LGTM

BalzaniEdoardo · 2025-08-25T11:56:29Z