Skip to content

Hard-code the hash function for the quick deflate algorithm #369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 27, 2025

Conversation

brian-pane
Copy link

No description provided.

Copy link

codecov bot commented May 27, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Flag Coverage Δ
test-aarch64-apple-darwin 87.45% <100.00%> (-0.04%) ⬇️
test-x86_64-apple-darwin 87.34% <100.00%> (-0.05%) ⬇️
test-x86_64-unknown-linux-gnu 87.08% <100.00%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
zlib-rs/src/deflate/algorithm/quick.rs 97.67% <100.00%> (ø)

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@brian-pane
Copy link
Author

This change seems to improve performance at compression level 1 (the only level where the quick alogrithm is used). Oddly, though, I tried the same technique for the fast and medium algorithms, but it produced a regression there, so I only changed quick.rs in this PR.

Benchmark 1 (67 runs): ./blogpost-compress-baseline 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          75.4ms ±  841us    74.6ms … 79.2ms          3 ( 4%)        0%
  peak_rss           26.6MB ± 60.2KB    26.4MB … 26.7MB          3 ( 4%)        0%
  cpu_cycles          291M  ± 1.78M      289M  …  304M           3 ( 4%)        0%
  instructions        555M  ±  275       555M  …  555M           0 ( 0%)        0%
  cache_references    264K  ± 3.84K      261K  …  285K           6 ( 9%)        0%
  cache_misses        227K  ± 7.31K      193K  …  243K           9 (13%)        0%
  branch_misses      3.06M  ± 5.98K     3.04M  … 3.07M           0 ( 0%)        0%
Benchmark 2 (69 runs): ./target/release/examples/blogpost-compress 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          73.4ms ±  425us    72.6ms … 74.7ms          3 ( 4%)        ⚡-  2.7% ±  0.3%
  peak_rss           26.6MB ± 49.0KB    26.5MB … 26.7MB          0 ( 0%)          +  0.0% ±  0.1%
  cpu_cycles          283M  ±  582K      282M  …  284M           1 ( 1%)        ⚡-  2.9% ±  0.2%
  instructions        544M  ±  368       544M  …  544M           2 ( 3%)        ⚡-  2.1% ±  0.0%
  cache_references    264K  ± 5.52K      260K  …  292K           6 ( 9%)          -  0.1% ±  0.6%
  cache_misses        228K  ± 7.98K      193K  …  237K           4 ( 6%)          +  0.4% ±  1.1%
  branch_misses      2.91M  ± 6.91K     2.90M  … 2.93M           2 ( 3%)        ⚡-  4.7% ±  0.1%

Copy link
Collaborator

@folkertdev folkertdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see improvements for cycles/instructions locally, though runtime is not significant. Still I think that confirms that the effect is real. Thanks!

Benchmark 2 (65 runs): target/release/examples/blogpost-compress 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          78.1ms ± 1.19ms    76.0ms … 81.6ms          0 ( 0%)          -  0.5% ±  0.5%
  peak_rss           26.6MB ± 73.0KB    26.5MB … 26.7MB          0 ( 0%)          -  0.0% ±  0.1%
  cpu_cycles          283M  ± 2.41M      278M  …  291M           6 ( 9%)        ⚡-  1.4% ±  0.3%
  instructions        566M  ±  282       566M  …  566M           0 ( 0%)        ⚡-  2.7% ±  0.0%

@folkertdev folkertdev merged commit e0768e7 into trifectatechfoundation:main May 27, 2025
22 of 24 checks passed
@brian-pane brian-pane deleted the hash-select branch May 27, 2025 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants