Skip to content

Hard-code the hash selection for the fast and medium deflate algorithms #373

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 28, 2025

Conversation

brian-pane
Copy link

Note: This only changes the quick_insert_string calls. In testing, bypassing the hash_calc_variant check on the insert_string calls in these algorithms appears to (counterintuitively) hurt performance.

Note: This only changes the quick_insert_string calls. In testing,
bypassing the hash_calc_variant check on the insert_string calls
in these algorithms appears to (counterintuitively) hurt performance.
Copy link

codecov bot commented May 28, 2025

Codecov Report

Attention: Patch coverage is 80.00000% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
zlib-rs/src/deflate/algorithm/medium.rs 66.66% 1 Missing ⚠️
Flag Coverage Δ
fuzz-compress ?
fuzz-decompress ?
test-aarch64-apple-darwin 93.38% <80.00%> (-0.02%) ⬇️
test-x86_64-apple-darwin 91.70% <80.00%> (+0.01%) ⬆️
test-x86_64-unknown-linux-gnu 90.45% <80.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
zlib-rs/src/deflate/algorithm/fast.rs 97.56% <100.00%> (ø)
zlib-rs/src/deflate/algorithm/medium.rs 92.27% <66.66%> (ø)

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@brian-pane
Copy link
Author

On my x86_64 test system (Intel i5-12400), this shows an improvement for compression levels 5 and 6 and doesn't appear to hurt the performance at levels 2 through 4.

Benchmark 1 (42 runs): ./blogpost-compress-baseline 2 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           121ms ± 3.56ms     119ms …  135ms          4 (10%)        0%
  peak_rss           24.9MB ± 61.1KB    24.8MB … 25.1MB          0 ( 0%)        0%
  cpu_cycles          489M  ± 6.47M      485M  …  515M           4 (10%)        0%
  instructions       1.08G  ±  374      1.08G  … 1.08G           1 ( 2%)        0%
  cache_references    275K  ± 22.0K      265K  …  398K           3 ( 7%)        0%
  cache_misses        234K  ± 8.36K      204K  …  249K           4 (10%)        0%
  branch_misses      6.19M  ± 6.34K     6.18M  … 6.22M           2 ( 5%)        0%
Benchmark 2 (42 runs): ./target/release/examples/blogpost-compress 2 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           120ms ±  972us     119ms …  125ms          1 ( 2%)          -  1.4% ±  0.9%
  peak_rss           24.9MB ± 75.7KB    24.6MB … 25.0MB          0 ( 0%)          -  0.1% ±  0.1%
  cpu_cycles          486M  ± 3.88M      483M  …  510M           1 ( 2%)          -  0.5% ±  0.5%
  instructions       1.07G  ±  382      1.07G  … 1.07G           0 ( 0%)          -  0.9% ±  0.0%
  cache_references    268K  ± 4.11K      264K  …  287K           4 (10%)          -  2.5% ±  2.5%
  cache_misses        232K  ± 6.48K      206K  …  240K           2 ( 5%)          -  0.9% ±  1.4%
  branch_misses      6.18M  ± 5.27K     6.17M  … 6.19M           0 ( 0%)          -  0.2% ±  0.0%
Benchmark 1 (37 runs): ./blogpost-compress-baseline 3 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           137ms ±  858us     136ms …  141ms          2 ( 5%)        0%
  peak_rss           24.7MB ± 68.3KB    24.5MB … 24.8MB          0 ( 0%)        0%
  cpu_cycles          563M  ± 2.99M      561M  …  580M           2 ( 5%)        0%
  instructions       1.41G  ±  307      1.41G  … 1.41G           0 ( 0%)        0%
  cache_references    273K  ± 18.7K      264K  …  364K           5 (14%)        0%
  cache_misses        227K  ± 11.3K      200K  …  239K           8 (22%)        0%
  branch_misses      7.07M  ± 5.35K     7.06M  … 7.08M           0 ( 0%)        0%
Benchmark 2 (37 runs): ./target/release/examples/blogpost-compress 3 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           137ms ± 1.51ms     136ms …  143ms          3 ( 8%)          -  0.1% ±  0.4%
  peak_rss           24.7MB ± 59.8KB    24.6MB … 24.8MB          0 ( 0%)          +  0.1% ±  0.1%
  cpu_cycles          561M  ± 4.74M      559M  …  589M           1 ( 3%)          -  0.3% ±  0.3%
  instructions       1.40G  ±  245      1.40G  … 1.40G           1 ( 3%)          -  0.6% ±  0.0%
  cache_references    276K  ± 26.6K      263K  …  382K           3 ( 8%)          +  1.0% ±  3.9%
  cache_misses        231K  ± 7.34K      202K  …  240K           4 (11%)          +  1.9% ±  1.9%
  branch_misses      7.06M  ± 5.01K     7.05M  … 7.07M           0 ( 0%)          -  0.1% ±  0.0%
Benchmark 1 (32 runs): ./blogpost-compress-baseline 4 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           160ms ± 1.09ms     159ms …  163ms          2 ( 6%)        0%
  peak_rss           24.6MB ± 72.5KB    24.4MB … 24.7MB          0 ( 0%)        0%
  cpu_cycles          663M  ± 3.95M      661M  …  681M           3 ( 9%)        0%
  instructions       1.51G  ±  355      1.51G  … 1.51G           0 ( 0%)        0%
  cache_references    268K  ± 2.69K      265K  …  277K           1 ( 3%)        0%
  cache_misses        231K  ± 8.68K      208K  …  239K           4 (13%)        0%
  branch_misses      7.57M  ± 4.54K     7.56M  … 7.58M           1 ( 3%)        0%
Benchmark 2 (32 runs): ./target/release/examples/blogpost-compress 4 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           159ms ± 1.44ms     158ms …  166ms          1 ( 3%)          -  0.3% ±  0.4%
  peak_rss           24.5MB ± 62.8KB    24.4MB … 24.7MB          0 ( 0%)          -  0.2% ±  0.1%
  cpu_cycles          661M  ± 5.46M      659M  …  690M           1 ( 3%)          -  0.3% ±  0.4%
  instructions       1.50G  ±  713      1.50G  … 1.50G           2 ( 6%)          -  0.6% ±  0.0%
  cache_references    269K  ± 5.41K      264K  …  288K           5 (16%)          +  0.3% ±  0.8%
  cache_misses        230K  ± 8.44K      200K  …  237K           3 ( 9%)          -  0.3% ±  1.9%
  branch_misses      7.64M  ± 4.18K     7.64M  … 7.65M           0 ( 0%)          +  1.0% ±  0.0%
Benchmark 1 (29 runs): ./blogpost-compress-baseline 5 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           177ms ±  707us     177ms …  180ms          2 ( 7%)        0%
  peak_rss           24.5MB ± 71.8KB    24.4MB … 24.6MB          1 ( 3%)        0%
  cpu_cycles          741M  ±  837K      739M  …  743M           1 ( 3%)        0%
  instructions       1.73G  ±  223      1.73G  … 1.73G           0 ( 0%)        0%
  cache_references    270K  ± 8.34K      264K  …  309K           4 (14%)        0%
  cache_misses        234K  ± 4.15K      217K  …  240K           1 ( 3%)        0%
  branch_misses      8.23M  ± 10.5K     8.21M  … 8.25M           1 ( 3%)        0%
Benchmark 2 (29 runs): ./target/release/examples/blogpost-compress 5 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           174ms ±  536us     174ms …  176ms          1 ( 3%)        ⚡-  1.8% ±  0.2%
  peak_rss           24.6MB ± 72.3KB    24.4MB … 24.7MB          0 ( 0%)          +  0.0% ±  0.2%
  cpu_cycles          728M  ± 2.08M      726M  …  738M           1 ( 3%)        ⚡-  1.8% ±  0.1%
  instructions       1.73G  ±  282      1.73G  … 1.73G           0 ( 0%)          -  0.3% ±  0.0%
  cache_references    271K  ± 9.88K      263K  …  309K           3 (10%)          +  0.2% ±  1.8%
  cache_misses        231K  ± 10.1K      205K  …  240K           4 (14%)          -  1.5% ±  1.7%
  branch_misses      8.21M  ± 7.63K     8.19M  … 8.22M           0 ( 0%)          -  0.2% ±  0.1%
Benchmark 1 (23 runs): ./blogpost-compress-baseline 6 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           219ms ± 1.22ms     217ms …  223ms          1 ( 4%)        0%
  peak_rss           24.5MB ± 81.8KB    24.3MB … 24.7MB          0 ( 0%)        0%
  cpu_cycles          915M  ±  700K      914M  …  917M           0 ( 0%)        0%
  instructions       1.90G  ±  385      1.90G  … 1.90G           2 ( 9%)        0%
  cache_references    278K  ± 17.7K      267K  …  342K           2 ( 9%)        0%
  cache_misses        233K  ± 5.65K      211K  …  241K           1 ( 4%)        0%
  branch_misses      8.40M  ± 8.18K     8.38M  … 8.41M           0 ( 0%)        0%
Benchmark 2 (24 runs): ./target/release/examples/blogpost-compress 6 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           215ms ±  762us     214ms …  217ms          0 ( 0%)        ⚡-  1.5% ±  0.3%
  peak_rss           24.5MB ± 89.5KB    24.4MB … 24.7MB          2 ( 8%)          +  0.1% ±  0.2%
  cpu_cycles          901M  ±  707K      900M  …  903M           1 ( 4%)        ⚡-  1.5% ±  0.0%
  instructions       1.90G  ±  353      1.90G  … 1.90G           5 (21%)          -  0.3% ±  0.0%
  cache_references    271K  ± 5.90K      266K  …  292K           3 (13%)          -  2.5% ±  2.8%
  cache_misses        234K  ± 3.02K      225K  …  240K           1 ( 4%)          +  0.6% ±  1.1%
  branch_misses      8.34M  ± 6.89K     8.33M  … 8.36M           0 ( 0%)          -  0.6% ±  0.1%

Copy link
Collaborator

@folkertdev folkertdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can roughly confirm the numbers (wall_time is not significant, but the decrease in instructions seems real and makes sense).

Something to try: add some #[inline] or #[inline(always)] the functions. I get some mixed results when I try it, but maybe there is some combination that works also for e.g. insert_string?

@folkertdev folkertdev merged commit 862005b into trifectatechfoundation:main May 28, 2025
24 checks passed
@brian-pane brian-pane deleted the hash-selection branch May 28, 2025 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants