Hard-code the hash selection for the fast and medium deflate algorithms #373

brian-pane · 2025-05-28T16:26:07Z

Note: This only changes the quick_insert_string calls. In testing, bypassing the hash_calc_variant check on the insert_string calls in these algorithms appears to (counterintuitively) hurt performance.

codecov · 2025-05-28T16:27:29Z

Codecov Report

Attention: Patch coverage is 80.00000% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
zlib-rs/src/deflate/algorithm/medium.rs	66.66%	1 Missing ⚠️

Flag	Coverage Δ
fuzz-compress	`?`
fuzz-decompress	`?`
test-aarch64-apple-darwin	`93.38% <80.00%> (-0.02%)`	⬇️
test-x86_64-apple-darwin	`91.70% <80.00%> (+0.01%)`	⬆️
test-x86_64-unknown-linux-gnu	`90.45% <80.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
zlib-rs/src/deflate/algorithm/fast.rs	`97.56% <100.00%> (ø)`
zlib-rs/src/deflate/algorithm/medium.rs	`92.27% <66.66%> (ø)`

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

brian-pane · 2025-05-28T16:34:45Z

On my x86_64 test system (Intel i5-12400), this shows an improvement for compression levels 5 and 6 and doesn't appear to hurt the performance at levels 2 through 4.

Benchmark 1 (42 runs): ./blogpost-compress-baseline 2 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           121ms ± 3.56ms     119ms …  135ms          4 (10%)        0%
  peak_rss           24.9MB ± 61.1KB    24.8MB … 25.1MB          0 ( 0%)        0%
  cpu_cycles          489M  ± 6.47M      485M  …  515M           4 (10%)        0%
  instructions       1.08G  ±  374      1.08G  … 1.08G           1 ( 2%)        0%
  cache_references    275K  ± 22.0K      265K  …  398K           3 ( 7%)        0%
  cache_misses        234K  ± 8.36K      204K  …  249K           4 (10%)        0%
  branch_misses      6.19M  ± 6.34K     6.18M  … 6.22M           2 ( 5%)        0%
Benchmark 2 (42 runs): ./target/release/examples/blogpost-compress 2 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           120ms ±  972us     119ms …  125ms          1 ( 2%)          -  1.4% ±  0.9%
  peak_rss           24.9MB ± 75.7KB    24.6MB … 25.0MB          0 ( 0%)          -  0.1% ±  0.1%
  cpu_cycles          486M  ± 3.88M      483M  …  510M           1 ( 2%)          -  0.5% ±  0.5%
  instructions       1.07G  ±  382      1.07G  … 1.07G           0 ( 0%)          -  0.9% ±  0.0%
  cache_references    268K  ± 4.11K      264K  …  287K           4 (10%)          -  2.5% ±  2.5%
  cache_misses        232K  ± 6.48K      206K  …  240K           2 ( 5%)          -  0.9% ±  1.4%
  branch_misses      6.18M  ± 5.27K     6.17M  … 6.19M           0 ( 0%)          -  0.2% ±  0.0%
Benchmark 1 (37 runs): ./blogpost-compress-baseline 3 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           137ms ±  858us     136ms …  141ms          2 ( 5%)        0%
  peak_rss           24.7MB ± 68.3KB    24.5MB … 24.8MB          0 ( 0%)        0%
  cpu_cycles          563M  ± 2.99M      561M  …  580M           2 ( 5%)        0%
  instructions       1.41G  ±  307      1.41G  … 1.41G           0 ( 0%)        0%
  cache_references    273K  ± 18.7K      264K  …  364K           5 (14%)        0%
  cache_misses        227K  ± 11.3K      200K  …  239K           8 (22%)        0%
  branch_misses      7.07M  ± 5.35K     7.06M  … 7.08M           0 ( 0%)        0%
Benchmark 2 (37 runs): ./target/release/examples/blogpost-compress 3 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           137ms ± 1.51ms     136ms …  143ms          3 ( 8%)          -  0.1% ±  0.4%
  peak_rss           24.7MB ± 59.8KB    24.6MB … 24.8MB          0 ( 0%)          +  0.1% ±  0.1%
  cpu_cycles          561M  ± 4.74M      559M  …  589M           1 ( 3%)          -  0.3% ±  0.3%
  instructions       1.40G  ±  245      1.40G  … 1.40G           1 ( 3%)          -  0.6% ±  0.0%
  cache_references    276K  ± 26.6K      263K  …  382K           3 ( 8%)          +  1.0% ±  3.9%
  cache_misses        231K  ± 7.34K      202K  …  240K           4 (11%)          +  1.9% ±  1.9%
  branch_misses      7.06M  ± 5.01K     7.05M  … 7.07M           0 ( 0%)          -  0.1% ±  0.0%
Benchmark 1 (32 runs): ./blogpost-compress-baseline 4 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           160ms ± 1.09ms     159ms …  163ms          2 ( 6%)        0%
  peak_rss           24.6MB ± 72.5KB    24.4MB … 24.7MB          0 ( 0%)        0%
  cpu_cycles          663M  ± 3.95M      661M  …  681M           3 ( 9%)        0%
  instructions       1.51G  ±  355      1.51G  … 1.51G           0 ( 0%)        0%
  cache_references    268K  ± 2.69K      265K  …  277K           1 ( 3%)        0%
  cache_misses        231K  ± 8.68K      208K  …  239K           4 (13%)        0%
  branch_misses      7.57M  ± 4.54K     7.56M  … 7.58M           1 ( 3%)        0%
Benchmark 2 (32 runs): ./target/release/examples/blogpost-compress 4 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           159ms ± 1.44ms     158ms …  166ms          1 ( 3%)          -  0.3% ±  0.4%
  peak_rss           24.5MB ± 62.8KB    24.4MB … 24.7MB          0 ( 0%)          -  0.2% ±  0.1%
  cpu_cycles          661M  ± 5.46M      659M  …  690M           1 ( 3%)          -  0.3% ±  0.4%
  instructions       1.50G  ±  713      1.50G  … 1.50G           2 ( 6%)          -  0.6% ±  0.0%
  cache_references    269K  ± 5.41K      264K  …  288K           5 (16%)          +  0.3% ±  0.8%
  cache_misses        230K  ± 8.44K      200K  …  237K           3 ( 9%)          -  0.3% ±  1.9%
  branch_misses      7.64M  ± 4.18K     7.64M  … 7.65M           0 ( 0%)          +  1.0% ±  0.0%
Benchmark 1 (29 runs): ./blogpost-compress-baseline 5 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           177ms ±  707us     177ms …  180ms          2 ( 7%)        0%
  peak_rss           24.5MB ± 71.8KB    24.4MB … 24.6MB          1 ( 3%)        0%
  cpu_cycles          741M  ±  837K      739M  …  743M           1 ( 3%)        0%
  instructions       1.73G  ±  223      1.73G  … 1.73G           0 ( 0%)        0%
  cache_references    270K  ± 8.34K      264K  …  309K           4 (14%)        0%
  cache_misses        234K  ± 4.15K      217K  …  240K           1 ( 3%)        0%
  branch_misses      8.23M  ± 10.5K     8.21M  … 8.25M           1 ( 3%)        0%
Benchmark 2 (29 runs): ./target/release/examples/blogpost-compress 5 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           174ms ±  536us     174ms …  176ms          1 ( 3%)        ⚡-  1.8% ±  0.2%
  peak_rss           24.6MB ± 72.3KB    24.4MB … 24.7MB          0 ( 0%)          +  0.0% ±  0.2%
  cpu_cycles          728M  ± 2.08M      726M  …  738M           1 ( 3%)        ⚡-  1.8% ±  0.1%
  instructions       1.73G  ±  282      1.73G  … 1.73G           0 ( 0%)          -  0.3% ±  0.0%
  cache_references    271K  ± 9.88K      263K  …  309K           3 (10%)          +  0.2% ±  1.8%
  cache_misses        231K  ± 10.1K      205K  …  240K           4 (14%)          -  1.5% ±  1.7%
  branch_misses      8.21M  ± 7.63K     8.19M  … 8.22M           0 ( 0%)          -  0.2% ±  0.1%
Benchmark 1 (23 runs): ./blogpost-compress-baseline 6 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           219ms ± 1.22ms     217ms …  223ms          1 ( 4%)        0%
  peak_rss           24.5MB ± 81.8KB    24.3MB … 24.7MB          0 ( 0%)        0%
  cpu_cycles          915M  ±  700K      914M  …  917M           0 ( 0%)        0%
  instructions       1.90G  ±  385      1.90G  … 1.90G           2 ( 9%)        0%
  cache_references    278K  ± 17.7K      267K  …  342K           2 ( 9%)        0%
  cache_misses        233K  ± 5.65K      211K  …  241K           1 ( 4%)        0%
  branch_misses      8.40M  ± 8.18K     8.38M  … 8.41M           0 ( 0%)        0%
Benchmark 2 (24 runs): ./target/release/examples/blogpost-compress 6 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           215ms ±  762us     214ms …  217ms          0 ( 0%)        ⚡-  1.5% ±  0.3%
  peak_rss           24.5MB ± 89.5KB    24.4MB … 24.7MB          2 ( 8%)          +  0.1% ±  0.2%
  cpu_cycles          901M  ±  707K      900M  …  903M           1 ( 4%)        ⚡-  1.5% ±  0.0%
  instructions       1.90G  ±  353      1.90G  … 1.90G           5 (21%)          -  0.3% ±  0.0%
  cache_references    271K  ± 5.90K      266K  …  292K           3 (13%)          -  2.5% ±  2.8%
  cache_misses        234K  ± 3.02K      225K  …  240K           1 ( 4%)          +  0.6% ±  1.1%
  branch_misses      8.34M  ± 6.89K     8.33M  … 8.36M           0 ( 0%)          -  0.6% ±  0.1%

folkertdev

I can roughly confirm the numbers (wall_time is not significant, but the decrease in instructions seems real and makes sense).

Something to try: add some #[inline] or #[inline(always)] the functions. I get some mixed results when I try it, but maybe there is some combination that works also for e.g. insert_string?

Hard-code the hash selection for the fast and medium deflate algorithms

3db3aae

Note: This only changes the quick_insert_string calls. In testing, bypassing the hash_calc_variant check on the insert_string calls in these algorithms appears to (counterintuitively) hurt performance.

folkertdev approved these changes May 28, 2025

View reviewed changes

folkertdev merged commit 862005b into trifectatechfoundation:main May 28, 2025
24 checks passed

brian-pane deleted the hash-selection branch May 28, 2025 21:29

BrewTestBot mentioned this pull request Jun 6, 2025

zlib-rs 0.5.1 Homebrew/homebrew-core#225936

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Hard-code the hash selection for the fast and medium deflate algorithms #373

Hard-code the hash selection for the fast and medium deflate algorithms #373

Uh oh!

brian-pane commented May 28, 2025

Uh oh!

codecov bot commented May 28, 2025 •

edited

Loading

Uh oh!

brian-pane commented May 28, 2025

Uh oh!

folkertdev left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Hard-code the hash selection for the fast and medium deflate algorithms #373

Hard-code the hash selection for the fast and medium deflate algorithms #373

Uh oh!

Conversation

brian-pane commented May 28, 2025

Uh oh!

codecov bot commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

brian-pane commented May 28, 2025

Uh oh!

folkertdev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov bot commented May 28, 2025 •

edited

Loading