Notes on errorbar enhancements

Here are some notes on planned changes to error bar specification.

Currently, what the error bars show is controlled through the `ci` parameter. This can be either a number, setting the width of a bootstrap confidence interval, or the string `"sd"`, indicating that the error var covers +/- the standard deviation of the data around the estimate value.

Some problems with this have been routinely noted:
- There's no option for parametric confidence intervals/standard error
- There's no option for showing a measure of data spread other than +/- 1 sigma
- `ci="sd"` does not really make conceptual sense as a parametrizing (it is the result of a short-sighted API decision)

In effect, you can think of the options as having a 2D taxonomy defined by whether the error bars show a measure of estimate certainty or data spread and whether the computation is parametric or nonparametric. Currently, we occupy two cells in this matrix:

| | Estimate certainty | Data spread |
|-|-|-|
|Parametric| | `ci="sd"`|
|Nonparametric| `ci=95`, `ci=68`, etc. | |

I would like to fill out the matrix. But we have a few challenges:

- As mentioned, the current API is not great, and overloading the meaning of `ci` further is a nonstarter
- There is no centralized location in the code where this parameter is interpreted and used

Plans for the new API will involve a new parameter, probably called `errorbar` but possibly `error`, `errbar` or some other shorthand, that accepts a tuple of the form `(kind, level)`. The first element determines what the error bars show, and the second parametrizes them. One proposal is to fill out the space like this:

| | Estimate certainty | Data spread |
|-|-|-|
| Parametric | `("se", scale)` | `("sd", scale)` |
| Nonparametric | `("ci", size)` | `("pi", size)` |

IMO, there is a lot of sense to this. You have four options for `kind`, each named using a bigram initialism. There are two kinds of level parameters:
- `scale`: multiplicatively scales a parametric error metric (e.g. `("sd", 3)` gives you a 3-sigma error bar, `("se", 1.96)` gives you a ~95% parametric confidence interval
- `size` sets the size of a nonparametric interval with percentiles (of the boostrap distribution for `ci` and the input data for `pi`) of `(1 - size) / 2, 1 - (1 - size) / 2`

There are also some potential drawbacks
- `"pi"` (i.e., "percentile interval") doesn't seem to be a commonly used term for a nonparametric measure of data spread. Actually I'm not sure there really is a term in the stats literature for such an interval, even though it's a very reasonable thing to plot (e.g. #1501)
- If you really want parametric 95% confidence intervals, this parametrization leaves you limited to a Z interval (and requires you to understand how to construct one from a standard error)

API decisions aside, the right implementation is going to take some thinking. Currently each module does its own errorbar computations. Most errorbars appear in the context of an aggregation-with-estimator operation. This can likely be abstracted. The other place they show up is in the regression module, where error bars are shown around the regression line. This needs to be handled differently, but statsmodels now has the `get_prediction` method which will do a lot of the work for us. We'll need a general enough implementation such that we can handle special cases (like logistic regression, where the SE/SD scaling should happen in logit space).

Here are some assorted open questions

- Should we accept simple strings (e.g. `errorbar="sd"`) with a default level value used internally?
- This simple 4 option system is still fairly limiting; it may disappoint those who would like to be able to use a generic function to get error bars (e.g. #2332). What might that API look like?
- Is it a sensible API option for `sd` to correspond to the prediction interval in a regression model?
- Should standard error correspond to the estimator and raise if the one used doesn't have a defined standard error? In other words, what would we do with `estimator="median", errorbar="se"`? And if the estimator is a callable, should we use its name to associate with the correct standard error function?
- It would be nice to have seaborn support multiple error bars from a sequence of `level` parameters, e.g. 1-2-3 sigma bands or 68-95-99 CIs (e.g. #1492). I like this kind of plot, but each plotting function will have to define its own logic for showing multiple error bars (e.g. layered alpha for error bands in `lineplot`, lines of diminishing width in `pointplot`). But still, if it's going to happen, we should at least plan for it here.
- What about additional arguments for bootstrapping (i.e. `n_boot`, `seed`?) It would be nice to reduce the number of parameters in the main function signatures, but I would like to keep the argument for `errorbar` a simple tuple and not a more complex object that could take optional parameters. I think...
- What about `loess`? (#552). Bootstrapping is still very slow, but statsmodels seems to still not have analytic confidence bands.
- What's the right order of operations for working on this? It should probably not be (fully) implemented until the categorical/regression modules can be refactored to use the core objects (where this should be handled).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Notes on errorbar enhancements #2403

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	Estimate certainty	Data spread
Parametric	`("se", scale)`	`("sd", scale)`
Nonparametric	`("ci", size)`	`("pi", size)`

Notes on errorbar enhancements #2403

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions