flatironinstitute · bagibence · Jun 19, 2025 · Jun 19, 2025 · Jun 19, 2025 · Jun 19, 2025
@@ -49,6 +49,56 @@ jobs:
         env:
           CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
 
+  tox_backend_jaxopt:
+    if: ${{ !github.event.pull_request.draft }}
+    strategy:
+      matrix:
+        os: [ubuntu-latest, macos-latest, windows-latest]
+        python-version: ['3.10', '3.11', '3.12']
+    runs-on: ${{ matrix.os }}
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+
+      - name: Set up Python
+        uses: actions/setup-python@v4 # Use v4 for compatibility with pyproject.toml
+        with:
+          python-version: ${{ matrix.python-version }}
+          cache: pip
+
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install tox
+
+      - name: Run solver-dependent tests with JAXopt backend
+        run: tox -e backend-jaxopt
+
+  tox_backend_optimistix:
+    if: ${{ !github.event.pull_request.draft }}
+    strategy:
+      matrix:
+        os: [ubuntu-latest, macos-latest, windows-latest]
+        python-version: ['3.10', '3.11', '3.12']
+    runs-on: ${{ matrix.os }}
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+
+      - name: Set up Python
+        uses: actions/setup-python@v4 # Use v4 for compatibility with pyproject.toml
+        with:
+          python-version: ${{ matrix.python-version }}
+          cache: pip
+
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install tox
+
+      - name: Run solver-dependent tests with Optimistix backend
+        run: tox -e backend-optimistix
+
   tox_check:
     if: ${{ !github.event.pull_request.draft }}
     runs-on: ubuntu-latest
@@ -110,6 +160,8 @@ jobs:
       - prevent_docs_absolute_links
       - tox_check
       - check-relative-links
+      # - tox_backend_jaxopt
+      # - tox_backend_optimistix
     runs-on: ubuntu-latest
     steps:
       - name: Decide whether all tests and notebooks succeeded

@@ -173,6 +173,12 @@ There are several options for how to run a subset of tests:
 - Run a specific test within a specific module: `pytests tests/test_glm.py::test_func`
 - Another example specifying a test method via the command line: `pytest tests/test_glm.py::GLMClass::test_func`
 
+To run tests with solvers implemented with either the `jaxopt` or the `optimistix` backend, set the `NEMOS_SOLVER_BACKEND` environment variable. E.g. setting for a single test run:
+`NEMOS_SOLVER_BACKEND=jaxopt pytest`
+
+There are also dedicated tox environments that do this automatically and run a subset of tests that depend on the solvers:
+`tox -e backend-jaxopt,backend-optimistix` will run solver-dependent tests with both backends separately.
+
 #### Adding tests
 
 New tests can be added in any of the existing `tests/test_*.py` scripts. Tests should be functions, contained within classes. The class contains a bunch of related tests

@@ -193,6 +193,19 @@ Utility functions for running convolution over the sample axis.
     create_convolutional_predictor
 
 
+The ``nemos.solvers`` module
+----------------------------
+JAX-based optimizers used for parameter fitting.
+
+.. currentmodule:: nemos.solvers
+
+.. autosummary::
+    :toctree: generated/solvers
+    :nosignatures:
+
+    get_solver_documentation
+
+
 The ``nemos.identifiability_constraints`` module
 ------------------------------------------------
 Functions to apply identifiability constraints to rank-deficient feature matrices, ensuring the uniqueness of model

@@ -6,7 +6,7 @@ The `base_class` module introduces the `Base` class and abstract classes definin
 
 The `Base` class is envisioned as the foundational component for any object type (e.g., basis, regression, dimensionality reduction, clustering, observation models, regularizers etc.). In contrast, abstract classes derived from `Base` define overarching object categories (e.g., `base_regressor.BaseRegressor` is building block for GLMs, GAMS, etc. while [`observation_models.Observations`](nemos.observation_models.Observations) is the building block for the Poisson observations, Gamma observations, ... etc.).
 
-Designed to be compatible with the `scikit-learn` API, the class inherits directly from [`sklearn.BaseEstimator`](https://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html#sklearn.base.BaseEstimator). The class facilitate access to `scikit-learn`'s robust pipeline and cross-validation modules, while customizing the `set_param` method for working with NeMoS basis objects. This is achieved while leveraging the accelerated computational capabilities of `jax` and `jaxopt` in the backend, which is essential for analyzing extensive neural recordings and fitting large models.
+Designed to be compatible with the `scikit-learn` API, the class inherits directly from [`sklearn.BaseEstimator`](https://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html#sklearn.base.BaseEstimator). The class facilitate access to `scikit-learn`'s robust pipeline and cross-validation modules, while customizing the `set_param` method for working with NeMoS basis objects. This is achieved while leveraging the accelerated computational capabilities of `jax` in the backend, which is essential for analyzing extensive neural recordings and fitting large models.
 
 Below a scheme of how we envision the architecture of the NeMoS models.
 

@@ -34,7 +34,7 @@ Public attributes are stored as properties:
 
 - `regularizer`: An instance of the [`nemos.regularizer.Regularizer`](nemos.regularizer.Regularizer) class. The setter for this property accepts either the instance directly or a string that is used to instantiate the appropriate regularizer.
 - `regularizer_strength`: A float quantifying the amount of regularization.
-- `solver_name`: One of the `jaxopt` solver supported solvers, currently "GradientDescent", "BFGS", "LBFGS", "ProximalGradient" and, "NonlinearCG".
+- `solver_name`: One of the supported solvers in the [solver registry](nemos.solvers._solver_registry.solver_registry), currently "GradientDescent", "BFGS", "LBFGS", "ProximalGradient", "SVRG", and "NonlinearCG".
 - `solver_kwargs`: Extra keyword arguments to be passed at solver initialization.
 - `solver_init_state`, `solver_update`, `solver_run`: Read-only property with a partially evaluated `solver.init_state`, `solver.update` and, `solver.run` methods. The partial evaluation guarantees a consistent API for all solvers.
 
@@ -45,8 +45,7 @@ Typically, in `YourRegressor` you will call `self.solver_init_state` at the para
 :::{admonition} Solvers
 :class: note
 
-Solvers are typically optimizers from the `jaxopt` package, but in principle they could be custom optimization routines as long as they respect the `jaxopt` api (i.e., have a `run`, `init_state`, and [`update`](nemos.glm.GLM.update) method with the appropriate input/output types).
-We rely on `jaxopt` because it provides a comprehensive set of robust, GPU accelerated, batchable and differentiable optimizers in JAX, that are highly customizable. In the future we may provide a number of custom solvers optimized for convex stochastic optimization.
+We implement a set of standard solvers in NeMoS, relying on various backends. In the future we are planning to add support for user-defined solvers, because in principle any object that adheres to the [`AbstractSolver`](nemos.solvers._abstract_solver.AbstractSolver) interface should be compatible with NeMoS. For more information about the solver interface and solvers, see the [developer notes about solvers](07-solvers.md).
 :::
 
 ## Contributor Guidelines

@@ -15,7 +15,7 @@ Our design aligns with the `scikit-learn` API, facilitating seamless integration
 
 The classes provided here are modular by design offering a standard foundation for any GLM variant.
 
-Instantiating a specific GLM simply requires providing an observation model (Gamma, Poisson, etc.), a regularization strategies (Ridge, Lasso, etc.) and an optimization scheme during initialization. This is done using the [`nemos.observation_models.Observations`](nemos.observation_models.Observations), [`nemos.regularizer.Regularizer`](nemos.regularizer.Regularizer) objects as well as the compatible `jaxopt` solvers, respectively.
+Instantiating a specific GLM simply requires providing an observation model (Gamma, Poisson, etc.), a regularization strategies (Ridge, Lasso, etc.) and an optimization scheme during initialization. This is done using the [`nemos.observation_models.Observations`](nemos.observation_models.Observations), [`nemos.regularizer.Regularizer`](nemos.regularizer.Regularizer) objects as well as the compatible solvers, respectively.
 
 
 <figure markdown>
@@ -41,7 +41,7 @@ The [`GLM`](nemos.glm.GLM) class provides a direct implementation of the GLM mod
 - **`intercept_`**: Stores the bias terms' solutions as `jax.ndarray` after the fitting process. It is initialized as `None` during class instantiation.
 - **`dof_resid_`**: The degrees of freedom of the model's residual. this quantity is used to estimate the scale parameter, see below, and compute frequentist confidence intervals.
 - **`scale_`**: The scale parameter of the observation distribution, which together with the rate, uniquely specifies a distribution of the exponential family. Example: a 1D Gaussian is specified by the mean which is the rate, and the standard deviation, which is the scale.
-- **`solver_state_`**: Indicates the solver's state. For specific solver states, refer to the [`jaxopt` documentation](https://jaxopt.github.io/stable/index.html#).
+- **`solver_state_`**: Indicates the solver's state. For specific solver states, refer to the [solver implementations](nemos.solvers).
 
 Additionally, the [`GLM`](nemos.glm.GLM) class inherits the attributes of `BaseRegressor`, see the [relative note](02-base_regressor.md) for more information.
 

@@ -21,7 +21,7 @@ Abstract Class Regularizer
 ```
 
 :::{note}
-If we need advanced adaptive solvers (e.g., Adam, LAMB etc.) in the future, we should consider adding [`Optax`](https://optax.readthedocs.io/en/latest/) as a dependency, which is compatible with `jaxopt`, see [here](https://jaxopt.github.io/stable/_autosummary/jaxopt.OptaxSolver.html#jaxopt.OptaxSolver).
+If we need advanced adaptive solvers (e.g., Adam, LAMB etc.) in the future, we can use [`Optax`](https://optax.readthedocs.io/en/latest/) solvers through [`OptimistixOptaxSolver`](nemos.solvers._optimistix_solvers.OptimistixOptaxSolver). See [`OptimistixOptaxLBFGS`](nemos.solvers._optax_optimistix_solvers.OptimistixOptaxLBFGS) for an example.
 :::
 
 (the-abstract-class-regularizer)=

@@ -0,0 +1,151 @@
+# The `solvers` Module
+
+## Background
+
+In the beginning NeMoS relied on [JAXopt](https://jaxopt.github.io/stable/) as its optimization backend.
+As JAXopt is no longer maintained, we added support for alternative optimization backends.
+
+Some of JAXopt's funtionality was ported to [Optax](https://optax.readthedocs.io/en/latest/) by Google, and [Optimistix](Optimistix) was started by the community to fill the gaps after JAXopt's deprecation.
+
+To support flexibility and long-term maintenance, NeMoS now has a backend-agnostic solver interface, allowing the use of solvers from different backend libraries with different interfaces.
+
+## `AbstractSolver` interface
+This interface is defined by [`AbstractSolver`](nemos.solvers.AbstractSolver) and mostly follows the JAXopt API.
+All solvers implemented in NeMoS are subclasses of `AbstractSolver`, however subclassing is not required for implementing solvers that can be used with NeMoS. (See [custom solvers](#custom_solvers))
+
+The `AbstractSolver` interface requires implementing the following methods:
+- `__init__`: all solver parameters and settings should go here. The other methods only take the solver state, current or initial solution (model parameters), and the input data for the objective function.
+- `init_state`: Initialize the solver state.
+- `update`: Take one step of the optimization algorithm.
+- `run`: Run a full optimization.
+- `get_accepted_arguments`: Set of argument names that can be passed to `__init__`. These will be the parameters users can change by passing `solver_kwargs` to `BaseRegressor` / `GLM`.
+- `get_optim_info`: Collect diagnostic information about the optimization run into an `OptimizationInfo` namedtuple.
+
+This is a generic class parametrized by `SolverState` and `StepResult`.
+`SolverState` in concrete subclasses should be the type of the solver state.
+`StepResult` is the type of what is returned by each step of the solver. Typically this is a tuple of the parameters and the solver state.
+
+### Optimization info
+Because different libraries store info about the optimization run in different places, we decided to standardize some common diagnostics.  
+Optimistix saves some things in the stats dict, Optax and Jaxopt store things in their state.
+These are saved in `solver.optimization_info` which is of type `OptimizationInfo`.
+
+`OptimizationInfo` holds the following fields:
+- `function_val`: The final value of the objective function. As not all solvers store this by default, and it's potentially expensive to evaluate, this field is optional.
+- `num_steps`: The number of steps taken by the solver.
+- `converged`: Whether the optimization converged according to the solver's criteria.
+- `reached_max_steps`: Whether the solver reached the maximum number of steps allowed.
+
+## Adapters
+Support for existing solvers from external libraries and the custom implementation of (Prox-)SVRG is done through adapters that "translate" between the interfaces of these external solvers and the `AbstractSolver` interface.
+
+Creating adapters for existing solvers can be done in multiple ways.
+In our experience wrapping solver objects through adapters provides a clean way of doing that, and recommend adapters for new optimization libraries to follow this pattern.
+
+[`SolverAdapter`](nemos.solvers.SolverAdapter) provides methods for wrapping existing solvers.  
+Each subclass of `SolverAdapter` has to define the methods of `AbstractInterface`, as well as a `_solver_cls` class variable signaling the type of solver wrapped by it.
+During construction it has to set a `_solver` attribute that is a concrete instance of `_solver_cls`.
+
+Default method implementations:
+- A default implementation of `get_accepted_arguments` is provided, returning the arguments to `__init__`, `_solver_cls`, and `_solver_cls.__init__`, and discarding the ones required by `AbstractSolver.__init__`.
+- `__getattr__` dispatches every attribute call to the wrapped `_solver`.
+- `__init_subclass__` generates a docstring for the adapter including accepted arguments and the wrapped solver's documentation.
+
+Currently we provide adapters for two optimization backends:
+- [`OptimistixAdapter`](nemos.solvers.OptimistixAdapter) wraps Optimistix solvers.
+- [`JaxoptAdapter`](nemos.solvers.JaxoptAdapter) wraps JAXopt solvers. As `SVRG` and `ProxSVRG` follow the JAXopt interface, these are also wrapped with `JaxoptAdapter`.
+
+
+## List of available solvers
+
+```
+Abstract Class AbstractSolver
+│
+├─ Abstract Subclass SolverAdapter
+│ │
+│ ├─ Abstract Subclass OptimistixAdapter
+│ │ │
+│ │ ├─ Concrete Subclass OptimistixBFGS
+│ │ ├─ Concrete Subclass OptimistixLBFGS
+│ │ ├─ Concrete Subclass OptimistixNonlinearCG
+│ │ └─ Concrete Subclass OptaxOptimistixSolver
+│ │   │
+│ │   ├─ Concrete Subclass OptaxOptimistixLBFGS
+│ │   ├─ Concrete Subclass OptaxOptimistixGradientDescent
+│ │   └─ Concrete Subclass OptaxOptimistixProximalGradient
+│ │
+│ └─ Abstract Subclass JaxoptAdapter
+│   │
+│   ├─ Concrete Subclass JaxoptLBFGS
+│   ├─ Concrete Subclass JaxoptGradientDescent
+│   ├─ Concrete Subclass JaxoptProximalGradient
+│   ├─ Concrete Subclass JaxoptBFGS
+│   ├─ Concrete Subclass JaxoptNonlinearCG
+│   │
+│   ├─ Concrete Subclass WrappedSVRG
+│   └─ Concrete Subclass WrappedProxSVRG
+```
+
+`OptaxOptimistixSolver` is for using Optax solvers, utilizing `optimistix.OptaxMinimiser` to run the full optimization loop.
+Optimistix does not have implementations of Nesterov acceleration, so gradient descent is implemented by wrapping `optax.sgd` which does support it.  
+Note that `OptaxOptimistixSolver` allows using any solver from Optax (e.g., Adam). See `OptaxOptimistixGradientDescent` for a template of how to wrap new Optax solvers.
+
+## Custom solvers
+If you want to use your own solver in `nemos`, you just have to write a solver that adheres to the `AbstractSolver` interface, and it should be straightforward to plug in.  
+While it is not necessary, a way to ensure adherence to the interface is subclassing `AbstractSolver`.
+
+Currently, the solver registry defines which implementation to use for each algorithm, so that has to be overwritten in order to tell `nemos` to use this custom class, but in the future we are [planning to support passing any solver to `BaseRegressor`](https://github.com/flatironinstitute/nemos/issues/378).
+
+We might also define something like an `ImplementsSolverInterface` protocol as well to easily check if user-supplied solvers define the methods required for the interface.
+
+## Stochastic optimization
+To run stochastic (~mini-batch) optimization, JAXopt used a `run_iterator` method.
+Instead of the full input data `run_iterator` accepts a generator / iterator that provides batches of data.
+
+For solvers defined in `nemos` that can be used this way, we will likely provide `StochasticMixin` which borrows the implementation from JAXopt. (Or some version of it. See below.).
+We will likely define an interface or protocol for this, allowing custom (user-defined) solvers to also implement their own version.
+We will also have to decide on how this will be exposed to users on the level of `BaseRegressor` and `GLM`.
+
+Note that (Prox-)SVRG is especially well-suited for running stochastic optimization, however it currently requires the optimization loop to be implemented separately as it is a bit more involved than what is done by `run_iterator`.  
+A potential solution to this would be to provide a separate method that accepts the full data and takes care of the batching. That might be a more convenient alternative to the current `run_iterator` as well.
+
+## Note on line searches vs. fixed stepsize in Optimistix
+By default Optimistix doesn't expose the search attribute of concrete solvers but we might want to flexibly switch between linesearches and constant learning rates depending on whether `stepsize` is passed to the solver.
+A solution to this would be to create short redefinitions of the required solvers with the `search` as an argument to `__init__`, and in the adapter dealing with `stepsize` with something like:
+```python
+class BFGS(AbstractBFGS[Y, Aux, _Hessian]):
+    rtol: float
+    atol: float
+    norm: Callable[[PyTree], Scalar]
+    use_inverse: bool
+    descent: NewtonDescent
+    search: AbstractSearch
+    verbose: frozenset[str]
+
+    def __init__(
+        self,
+        rtol: float,
+        atol: float,
+        norm: Callable[[PyTree], Scalar] = max_norm,
+        use_inverse: bool = True,
+        verbose: frozenset[str] = frozenset(),
+        search: AbstractSearch = Zoom(initial_guess_strategy="one"),
+    ):
+        self.rtol = rtol
+        self.atol = atol
+        self.norm = norm
+        self.use_inverse = use_inverse
+        self.descent = NewtonDescent(linear_solver=lx.Cholesky())
+        self.search = search
+        self.verbose = verbose
+```
+
+and
+
+```python
+if "stepsize" in solver_init_kwargs:
+   assert "search" not in solver_init_kwargs, "Specify either search or stepsize"
+   solver_init_kwargs["search"] = optx.LearningRate(
+       solver_init_kwargs.pop("stepsize")
+   )
+```
@@ -12,6 +12,7 @@
 04-basis_module.md
 05-observation_models.md
 06-regularizer.md
+07-solvers.md
 ```
 
 ## Introduction