Skip to content

Conversation

mwaskom
Copy link
Owner

@mwaskom mwaskom commented May 4, 2021

Closes #2550

Now we only pass an array of bins into the numpy function when that's what the user supplied. Otherwise we keep track of the number of bins and bin range. This gives a ~order of magnitude performance boost for medium-large arrays (note log scales):

bins = 50
ns = np.logspace(2, 7, num=6, dtype=int)
times = pd.Series(index=ns, dtype=float)

for n in ns:
    hist = sns._statistics.Histogram(bins=bins)
    x = np.random.normal(0, 1, size=n)
    res = %timeit -o hist(x)
    times[n] = res.average

image

No change to external API.

@codecov
Copy link

codecov bot commented May 4, 2021

Codecov Report

Merging #2570 (0c2548e) into master (ec8fcc9) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head 0c2548e differs from pull request most recent head b29166f. Consider uploading reports for the commit b29166f to get more accurate results
Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2570   +/-   ##
=======================================
  Coverage   97.45%   97.45%           
=======================================
  Files          17       17           
  Lines        6332     6337    +5     
=======================================
+ Hits         6171     6176    +5     
  Misses        161      161           
Impacted Files Coverage Δ
seaborn/_statistics.py 100.00% <100.00%> (ø)
seaborn/distributions.py 96.37% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ec8fcc9...b29166f. Read the comment docs.

@mwaskom mwaskom merged commit 09f5746 into master May 5, 2021
@mwaskom mwaskom deleted the histplot_efficiency branch May 5, 2021 11:30
@mwaskom mwaskom modified the milestones: v0.12.0, v0.11.2 Aug 6, 2021
mwaskom added a commit that referenced this pull request Aug 6, 2021
* Define histogram params with bin count and range when possible

* Rename bin method for clarity

* Update release notes [skip ci]

(cherry picked from commit 09f5746)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Histogram computation is inefficient with large samples

1 participant