Skip to content

Notes on errorbar enhancements #2403

@mwaskom

Description

@mwaskom

Here are some notes on planned changes to error bar specification.

Currently, what the error bars show is controlled through the ci parameter. This can be either a number, setting the width of a bootstrap confidence interval, or the string "sd", indicating that the error var covers +/- the standard deviation of the data around the estimate value.

Some problems with this have been routinely noted:

  • There's no option for parametric confidence intervals/standard error
  • There's no option for showing a measure of data spread other than +/- 1 sigma
  • ci="sd" does not really make conceptual sense as a parametrizing (it is the result of a short-sighted API decision)

In effect, you can think of the options as having a 2D taxonomy defined by whether the error bars show a measure of estimate certainty or data spread and whether the computation is parametric or nonparametric. Currently, we occupy two cells in this matrix:

Estimate certainty Data spread
Parametric ci="sd"
Nonparametric ci=95, ci=68, etc.

I would like to fill out the matrix. But we have a few challenges:

  • As mentioned, the current API is not great, and overloading the meaning of ci further is a nonstarter
  • There is no centralized location in the code where this parameter is interpreted and used

Plans for the new API will involve a new parameter, probably called errorbar but possibly error, errbar or some other shorthand, that accepts a tuple of the form (kind, level). The first element determines what the error bars show, and the second parametrizes them. One proposal is to fill out the space like this:

Estimate certainty Data spread
Parametric ("se", scale) ("sd", scale)
Nonparametric ("ci", size) ("pi", size)

IMO, there is a lot of sense to this. You have four options for kind, each named using a bigram initialism. There are two kinds of level parameters:

  • scale: multiplicatively scales a parametric error metric (e.g. ("sd", 3) gives you a 3-sigma error bar, ("se", 1.96) gives you a ~95% parametric confidence interval
  • size sets the size of a nonparametric interval with percentiles (of the boostrap distribution for ci and the input data for pi) of (1 - size) / 2, 1 - (1 - size) / 2

There are also some potential drawbacks

  • "pi" (i.e., "percentile interval") doesn't seem to be a commonly used term for a nonparametric measure of data spread. Actually I'm not sure there really is a term in the stats literature for such an interval, even though it's a very reasonable thing to plot (e.g. Support plotting quantiles of the data distribution with lineplot #1501)
  • If you really want parametric 95% confidence intervals, this parametrization leaves you limited to a Z interval (and requires you to understand how to construct one from a standard error)

API decisions aside, the right implementation is going to take some thinking. Currently each module does its own errorbar computations. Most errorbars appear in the context of an aggregation-with-estimator operation. This can likely be abstracted. The other place they show up is in the regression module, where error bars are shown around the regression line. This needs to be handled differently, but statsmodels now has the get_prediction method which will do a lot of the work for us. We'll need a general enough implementation such that we can handle special cases (like logistic regression, where the SE/SD scaling should happen in logit space).

Here are some assorted open questions

  • Should we accept simple strings (e.g. errorbar="sd") with a default level value used internally?
  • This simple 4 option system is still fairly limiting; it may disappoint those who would like to be able to use a generic function to get error bars (e.g. Added option for ci to be a callable. #2332). What might that API look like?
  • Is it a sensible API option for sd to correspond to the prediction interval in a regression model?
  • Should standard error correspond to the estimator and raise if the one used doesn't have a defined standard error? In other words, what would we do with estimator="median", errorbar="se"? And if the estimator is a callable, should we use its name to associate with the correct standard error function?
  • It would be nice to have seaborn support multiple error bars from a sequence of level parameters, e.g. 1-2-3 sigma bands or 68-95-99 CIs (e.g. [Feature request] Support lineplot with error bands of list of scaled standard deviations #1492). I like this kind of plot, but each plotting function will have to define its own logic for showing multiple error bars (e.g. layered alpha for error bands in lineplot, lines of diminishing width in pointplot). But still, if it's going to happen, we should at least plan for it here.
  • What about additional arguments for bootstrapping (i.e. n_boot, seed?) It would be nice to reduce the number of parameters in the main function signatures, but I would like to keep the argument for errorbar a simple tuple and not a more complex object that could take optional parameters. I think...
  • What about loess? (regplot: support confidence intervals with lowess model #552). Bootstrapping is still very slow, but statsmodels seems to still not have analytic confidence bands.
  • What's the right order of operations for working on this? It should probably not be (fully) implemented until the categorical/regression modules can be refactored to use the core objects (where this should be handled).

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions