Skip to content

[discussion] Adding ECDF to seaborn? #1536

@ericmjl

Description

@ericmjl

@mwaskom referencing this tweet re: ECDFs; I have a simple implementation ready to go which I have stored in textexpander, but I think it might be a useful contribution to seaborn users.

The simplest unit of visualization is a scatterplot, for which an API might be:

def ecdf(df, column, ax=None, step=True):
    #### "if ax" logic goes here" ####
    np.sort(df[column]), np.arange(1, len(df)+1) / len(df)
    if step:
        ax.step(x, y)
    else:
        ax.scatter(x, y)
    return ax

With this plotting unit, it can be easily inserted into the pairplot as a replacement for the histogram that occurs on the diagonal (as an option for end-users, of course, not mandatory). I can also see extensions to other kinds of plots, for example, plotting multiple ECDFs on the same axes object.

As I understand it, distplot exists, and yes, granted, visualizing histograms is quite idiomatic for many users. That said, I do see some advantages of using ECDFs over histograms, the biggest one being that all data points are plotted, meaning it is impossible to bias the data using bins. I have more details in a blog post, but at a high level, the other biggest advantage I can see is reading off quantiles from the data easily. Also, compared to estimating a KDE, we make no assumptions regarding how the data are distributed (though yes, we can debate whether this is a good or bad thing).

If you're open to having ECDFs inside seaborn, I'm happy to work on a PR for this. Might need some guidance to see if there's special things I need to look out for in the codebase (admittedly, it'll be my first PR to seaborn). Please let me know; I'm also happy to discuss more here before taking any action.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions