Skip to content

Cache locality improvement for deflate State.lookahead #372

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 28, 2025

Conversation

brian-pane
Copy link

No description provided.

Copy link

codecov bot commented May 28, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Flag Coverage Δ
fuzz-compress ?
fuzz-decompress ?
test-aarch64-apple-darwin 93.37% <100.00%> (+<0.01%) ⬆️
test-x86_64-apple-darwin 91.70% <100.00%> (-0.01%) ⬇️
test-x86_64-unknown-linux-gnu 90.46% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
zlib-rs/src/deflate.rs 97.12% <100.00%> (+0.09%) ⬆️
zlib-rs/src/deflate/hash_calc.rs 100.00% <100.00%> (ø)
zlib-rs/src/deflate/longest_match.rs 95.59% <100.00%> (ø)

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@brian-pane
Copy link
Author

This change sacrifices the precomputed w_mask field, and therefore adds a subtraction instruction in the performance-critical quick_insert_string operation, in order to move the lookahead field (which also is used in performance-critical loops) into the same cache line as the other frequently-used fields.

This improves cycle count for a few compression levels on my Intel x86_64 test system. It creates an increase in instructions at compression level 2, but that doesn't result in an increase in cycles -- possibly because the CPU is able to schedule the needed subtraction operation in an otherwise unused slot?

I anticipate that the performance of this PR will vary among CPU types.

Before/after:

Benchmark 1 (69 runs): ./blogpost-compress-baseline 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          72.8ms ±  694us    72.1ms … 77.2ms          7 (10%)        0%
  peak_rss           26.6MB ± 65.0KB    26.3MB … 26.7MB          1 ( 1%)        0%
  cpu_cycles          283M  ±  891K      281M  …  288M           2 ( 3%)        0%
  instructions        544M  ±  268       544M  …  544M           0 ( 0%)        0%
  cache_references    263K  ± 5.70K      260K  …  303K           6 ( 9%)        0%
  cache_misses        230K  ± 7.98K      197K  …  237K           5 ( 7%)        0%
  branch_misses      2.91M  ± 6.15K     2.90M  … 2.93M           1 ( 1%)        0%
Benchmark 2 (70 runs): ./target/release/examples/blogpost-compress 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          72.1ms ±  426us    71.3ms … 73.1ms          0 ( 0%)          -  1.0% ±  0.3%
  peak_rss           26.6MB ± 59.3KB    26.5MB … 26.8MB          1 ( 1%)          +  0.0% ±  0.1%
  cpu_cycles          279M  ±  608K      278M  …  281M           0 ( 0%)        ⚡-  1.3% ±  0.1%
  instructions        549M  ±  293       549M  …  549M           0 ( 0%)        💩+  1.1% ±  0.0%
  cache_references    263K  ± 3.05K      261K  …  285K           6 ( 9%)          -  0.0% ±  0.6%
  cache_misses        231K  ± 6.85K      200K  …  240K           5 ( 7%)          +  0.5% ±  1.1%
  branch_misses      2.85M  ± 5.40K     2.84M  … 2.86M           0 ( 0%)        ⚡-  2.2% ±  0.1%
Benchmark 1 (42 runs): ./blogpost-compress-baseline 2 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           120ms ± 1.03ms     119ms …  126ms          1 ( 2%)        0%
  peak_rss           24.9MB ± 58.7KB    24.8MB … 25.0MB          0 ( 0%)        0%
  cpu_cycles          489M  ± 1.33M      487M  …  494M           1 ( 2%)        0%
  instructions       1.07G  ±  382      1.07G  … 1.07G           2 ( 5%)        0%
  cache_references    268K  ± 3.50K      264K  …  280K           1 ( 2%)        0%
  cache_misses        232K  ± 9.38K      201K  …  246K           5 (12%)        0%
  branch_misses      6.19M  ± 7.85K     6.18M  … 6.21M           3 ( 7%)        0%
Benchmark 2 (42 runs): ./target/release/examples/blogpost-compress 2 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           119ms ±  720us     118ms …  122ms          1 ( 2%)          -  0.6% ±  0.3%
  peak_rss           24.9MB ± 53.7KB    24.8MB … 25.0MB          0 ( 0%)          +  0.0% ±  0.1%
  cpu_cycles          486M  ± 1.08M      484M  …  488M           0 ( 0%)          -  0.7% ±  0.1%
  instructions       1.08G  ±  326      1.08G  … 1.08G           0 ( 0%)        💩+  1.0% ±  0.0%
  cache_references    272K  ± 19.2K      264K  …  385K           5 (12%)          +  1.8% ±  2.2%
  cache_misses        231K  ± 9.54K      201K  …  244K           7 (17%)          -  0.3% ±  1.8%
  branch_misses      6.20M  ± 4.94K     6.20M  … 6.22M           0 ( 0%)          +  0.2% ±  0.0%
Benchmark 1 (37 runs): ./blogpost-compress-baseline 3 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           138ms ± 1.68ms     136ms …  146ms          3 ( 8%)        0%
  peak_rss           24.7MB ± 88.6KB    24.5MB … 24.8MB          0 ( 0%)        0%
  cpu_cycles          567M  ± 5.26M      564M  …  590M           3 ( 8%)        0%
  instructions       1.40G  ±  352      1.40G  … 1.40G           0 ( 0%)        0%
  cache_references    270K  ± 8.04K      265K  …  311K           4 (11%)        0%
  cache_misses        234K  ± 8.03K      210K  …  240K           4 (11%)        0%
  branch_misses      7.05M  ± 5.27K     7.04M  … 7.06M           0 ( 0%)        0%
Benchmark 2 (37 runs): ./target/release/examples/blogpost-compress 3 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           136ms ±  914us     135ms …  141ms          1 ( 3%)          -  1.1% ±  0.5%
  peak_rss           24.7MB ±  105KB    24.5MB … 24.8MB          0 ( 0%)          +  0.1% ±  0.2%
  cpu_cycles          561M  ± 3.48M      558M  …  581M           1 ( 3%)          -  1.0% ±  0.4%
  instructions       1.41G  ±  309      1.41G  … 1.41G           0 ( 0%)          +  0.8% ±  0.0%
  cache_references    281K  ± 67.1K      265K  …  677K           3 ( 8%)          +  4.1% ±  8.2%
  cache_misses        235K  ± 7.60K      210K  …  251K           4 (11%)          +  0.2% ±  1.5%
  branch_misses      7.08M  ± 7.30K     7.07M  … 7.10M           0 ( 0%)          +  0.5% ±  0.0%
Benchmark 1 (32 runs): ./blogpost-compress-baseline 4 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           161ms ±  804us     160ms …  163ms          0 ( 0%)        0%
  peak_rss           24.5MB ±  104KB    24.3MB … 24.7MB          0 ( 0%)        0%
  cpu_cycles          668M  ±  876K      666M  …  669M           0 ( 0%)        0%
  instructions       1.50G  ±  320      1.50G  … 1.50G           0 ( 0%)        0%
  cache_references    270K  ± 3.03K      265K  …  278K           1 ( 3%)        0%
  cache_misses        234K  ± 9.02K      208K  …  242K           5 (16%)        0%
  branch_misses      7.57M  ± 6.84K     7.56M  … 7.59M           1 ( 3%)        0%
Benchmark 2 (32 runs): ./target/release/examples/blogpost-compress 4 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           159ms ±  782us     158ms …  161ms          2 ( 6%)          -  1.2% ±  0.2%
  peak_rss           24.5MB ±  107KB    24.4MB … 24.7MB          0 ( 0%)          +  0.1% ±  0.2%
  cpu_cycles          660M  ± 1.05M      658M  …  662M           0 ( 0%)        ⚡-  1.2% ±  0.1%
  instructions       1.51G  ±  266      1.51G  … 1.51G           0 ( 0%)          +  0.8% ±  0.0%
  cache_references    270K  ± 4.58K      265K  …  285K           2 ( 6%)          +  0.1% ±  0.7%
  cache_misses        233K  ± 8.12K      211K  …  240K           6 (19%)          -  0.4% ±  1.8%
  branch_misses      7.61M  ± 6.28K     7.60M  … 7.62M           0 ( 0%)          +  0.6% ±  0.0%
Benchmark 1 (28 runs): ./blogpost-compress-baseline 5 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           180ms ±  761us     179ms …  182ms          2 ( 7%)        0%
  peak_rss           24.5MB ±  114KB    24.3MB … 24.7MB          0 ( 0%)        0%
  cpu_cycles          750M  ± 1.42M      749M  …  757M           1 ( 4%)        0%
  instructions       1.72G  ±  252      1.72G  … 1.72G           0 ( 0%)        0%
  cache_references    272K  ± 7.60K      265K  …  297K           5 (18%)        0%
  cache_misses        237K  ± 1.68K      233K  …  241K           0 ( 0%)        0%
  branch_misses      8.26M  ± 6.16K     8.24M  … 8.27M           1 ( 4%)        0%
Benchmark 2 (29 runs): ./target/release/examples/blogpost-compress 5 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           177ms ± 1.10ms     177ms …  182ms          1 ( 3%)          -  1.2% ±  0.3%
  peak_rss           24.5MB ±  113KB    24.3MB … 24.7MB          0 ( 0%)          +  0.1% ±  0.2%
  cpu_cycles          740M  ±  864K      738M  …  742M           1 ( 3%)        ⚡-  1.4% ±  0.1%
  instructions       1.73G  ±  255      1.73G  … 1.73G           0 ( 0%)          +  0.7% ±  0.0%
  cache_references    273K  ± 10.2K      265K  …  310K           3 (10%)          +  0.2% ±  1.8%
  cache_misses        235K  ± 6.40K      213K  …  246K           3 (10%)          -  0.6% ±  1.1%
  branch_misses      8.28M  ± 16.9K     8.26M  … 8.33M           0 ( 0%)          +  0.3% ±  0.1%
Benchmark 1 (23 runs): ./blogpost-compress-baseline 6 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           221ms ±  851us     220ms …  224ms          1 ( 4%)        0%
  peak_rss           24.5MB ±  123KB    24.3MB … 24.7MB          0 ( 0%)        0%
  cpu_cycles          926M  ±  815K      925M  …  927M           0 ( 0%)        0%
  instructions       1.89G  ±  270      1.89G  … 1.89G           0 ( 0%)        0%
  cache_references    270K  ± 4.36K      266K  …  285K           1 ( 4%)        0%
  cache_misses        235K  ± 7.81K      213K  …  240K           3 (13%)        0%
  branch_misses      8.42M  ± 6.52K     8.41M  … 8.43M           0 ( 0%)        0%
Benchmark 2 (23 runs): ./target/release/examples/blogpost-compress 6 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           218ms ± 1.08ms     216ms …  221ms          0 ( 0%)          -  1.1% ±  0.3%
  peak_rss           24.5MB ±  122KB    24.3MB … 24.7MB          0 ( 0%)          +  0.1% ±  0.3%
  cpu_cycles          914M  ± 2.00M      912M  …  920M           2 ( 9%)        ⚡-  1.3% ±  0.1%
  instructions       1.90G  ±  313      1.90G  … 1.90G           0 ( 0%)          +  0.6% ±  0.0%
  cache_references    276K  ± 12.6K      266K  …  315K           2 ( 9%)          +  2.1% ±  2.1%
  cache_misses        235K  ± 8.75K      211K  …  245K           5 (22%)          -  0.0% ±  2.1%
  branch_misses      8.43M  ± 9.55K     8.41M  … 8.46M           1 ( 4%)          +  0.1% ±  0.1%
Benchmark 1 (17 runs): ./blogpost-compress-baseline 7 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           305ms ±  601us     304ms …  307ms          0 ( 0%)        0%
  peak_rss           24.3MB ± 52.3KB    24.2MB … 24.4MB          0 ( 0%)        0%
  cpu_cycles         1.28G  ±  729K     1.28G  … 1.29G           1 ( 6%)        0%
  instructions       2.28G  ±  276      2.28G  … 2.28G           1 ( 6%)        0%
  cache_references    275K  ± 10.7K      268K  …  308K           1 ( 6%)        0%
  cache_misses        236K  ± 5.69K      216K  …  241K           1 ( 6%)        0%
  branch_misses      9.65M  ± 8.90K     9.63M  … 9.67M           0 ( 0%)        0%
Benchmark 2 (17 runs): ./target/release/examples/blogpost-compress 7 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           302ms ±  861us     301ms …  305ms          1 ( 6%)          -  1.0% ±  0.2%
  peak_rss           24.3MB ± 48.3KB    24.3MB … 24.4MB          0 ( 0%)          +  0.0% ±  0.1%
  cpu_cycles         1.27G  ± 1.37M     1.27G  … 1.27G           0 ( 0%)          -  1.0% ±  0.1%
  instructions       2.30G  ±  328      2.30G  … 2.30G           0 ( 0%)          +  0.6% ±  0.0%
  cache_references    278K  ± 28.1K      266K  …  383K           2 (12%)          +  1.0% ±  5.4%
  cache_misses        238K  ± 4.23K      222K  …  241K           1 ( 6%)          +  0.8% ±  1.5%
  branch_misses      9.67M  ± 10.6K     9.65M  … 9.69M           0 ( 0%)          +  0.2% ±  0.1%
Benchmark 1 (13 runs): ./blogpost-compress-baseline 8 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           401ms ± 1.13ms     400ms …  404ms          1 ( 8%)        0%
  peak_rss           24.3MB ± 54.9KB    24.3MB … 24.4MB          0 ( 0%)        0%
  cpu_cycles         1.69G  ± 1.17M     1.69G  … 1.69G           0 ( 0%)        0%
  instructions       2.73G  ±  187      2.73G  … 2.73G           1 ( 8%)        0%
  cache_references    271K  ± 6.13K      267K  …  290K           1 ( 8%)        0%
  cache_misses        237K  ± 5.91K      218K  …  242K           1 ( 8%)        0%
  branch_misses      9.77M  ± 5.90K     9.76M  … 9.79M           0 ( 0%)        0%
Benchmark 2 (13 runs): ./target/release/examples/blogpost-compress 8 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           397ms ±  722us     396ms …  398ms          0 ( 0%)          -  0.9% ±  0.2%
  peak_rss           24.4MB ± 66.6KB    24.2MB … 24.5MB          0 ( 0%)          +  0.1% ±  0.2%
  cpu_cycles         1.67G  ± 1.69M     1.67G  … 1.68G           1 ( 8%)          -  0.9% ±  0.1%
  instructions       2.75G  ±  205      2.75G  … 2.75G           0 ( 0%)          +  0.5% ±  0.0%
  cache_references    271K  ± 3.44K      267K  …  277K           0 ( 0%)          +  0.3% ±  1.5%
  cache_misses        238K  ± 5.16K      222K  …  242K           1 ( 8%)          +  0.3% ±  1.9%
  branch_misses      9.80M  ± 13.2K     9.78M  … 9.82M           0 ( 0%)          +  0.3% ±  0.1%
Benchmark 1 (12 runs): ./blogpost-compress-baseline 9 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           418ms ± 1.13ms     417ms …  420ms          0 ( 0%)        0%
  peak_rss           24.3MB ± 81.5KB    24.2MB … 24.4MB          1 ( 8%)        0%
  cpu_cycles         1.76G  ± 2.62M     1.76G  … 1.77G           0 ( 0%)        0%
  instructions       3.35G  ±  330      3.35G  … 3.35G           0 ( 0%)        0%
  cache_references    283K  ± 28.6K      266K  …  371K           1 ( 8%)        0%
  cache_misses        237K  ± 7.68K      220K  …  243K           3 (25%)        0%
  branch_misses      15.4M  ± 20.6K     15.4M  … 15.5M           0 ( 0%)        0%
Benchmark 2 (12 runs): ./target/release/examples/blogpost-compress 9 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           419ms ± 1.33ms     417ms …  421ms          0 ( 0%)          +  0.1% ±  0.2%
  peak_rss           24.3MB ± 73.4KB    24.2MB … 24.4MB          1 ( 8%)          -  0.0% ±  0.3%
  cpu_cycles         1.77G  ± 2.89M     1.76G  … 1.77G           0 ( 0%)          +  0.1% ±  0.1%
  instructions       3.38G  ±  277      3.38G  … 3.38G           0 ( 0%)          +  1.0% ±  0.0%
  cache_references    277K  ± 8.06K      267K  …  297K           1 ( 8%)          -  1.9% ±  6.3%
  cache_misses        236K  ± 8.79K      210K  …  244K           1 ( 8%)          -  0.3% ±  2.9%
  branch_misses      15.5M  ± 33.4K     15.5M  … 15.5M           0 ( 0%)          +  0.4% ±  0.2%

state.prev.as_mut_slice()[idx as usize & state.w_mask] = head;
state.prev.as_mut_slice()[idx as usize & state.w_mask()] = head;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see some further improvements for level 2 when moving the state.w_mask() call out of the loop. Did/can you try that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly for above

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the update I just pushed, moving state.w_mask() out of the loop doesn't help much at level 2 on my system, but it produces some improvements at higher compression levels:

Benchmark 1 (68 runs): ./blogpost-compress-baseline 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          73.5ms ±  776us    72.5ms … 77.5ms          1 ( 1%)        0%
  peak_rss           26.6MB ± 74.0KB    26.5MB … 26.8MB          0 ( 0%)        0%
  cpu_cycles          283M  ±  743K      282M  …  286M           2 ( 3%)        0%
  instructions        544M  ±  285       544M  …  544M           0 ( 0%)        0%
  cache_references    267K  ± 18.3K      261K  …  400K           7 (10%)        0%
  cache_misses        228K  ± 9.18K      202K  …  238K           8 (12%)        0%
  branch_misses      2.91M  ± 6.68K     2.90M  … 2.94M           1 ( 1%)        0%
Benchmark 2 (69 runs): ./target/release/examples/blogpost-compress 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          72.8ms ±  841us    71.8ms … 78.8ms          2 ( 3%)          -  1.0% ±  0.4%
  peak_rss           26.6MB ± 70.6KB    26.3MB … 26.8MB          1 ( 1%)          -  0.1% ±  0.1%
  cpu_cycles          281M  ± 3.12M      279M  …  306M           2 ( 3%)          -  0.6% ±  0.3%
  instructions        549M  ±  289       549M  …  549M           0 ( 0%)        💩+  1.1% ±  0.0%
  cache_references    265K  ± 5.71K      261K  …  301K           6 ( 9%)          -  0.9% ±  1.7%
  cache_misses        230K  ± 6.91K      197K  …  238K           4 ( 6%)          +  0.7% ±  1.2%
  branch_misses      2.89M  ± 6.34K     2.88M  … 2.91M           0 ( 0%)          -  0.7% ±  0.1%
Benchmark 1 (42 runs): ./blogpost-compress-baseline 2 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           121ms ± 5.79ms     119ms …  157ms          1 ( 2%)        0%
  peak_rss           24.9MB ± 75.8KB    24.8MB … 25.1MB          0 ( 0%)        0%
  cpu_cycles          493M  ± 23.8M      487M  …  643M           1 ( 2%)        0%
  instructions       1.07G  ±  348      1.07G  … 1.07G           2 ( 5%)        0%
  cache_references    269K  ± 11.6K      264K  …  332K           4 (10%)        0%
  cache_misses        229K  ± 9.71K      200K  …  239K           7 (17%)        0%
  branch_misses      6.19M  ± 8.99K     6.17M  … 6.21M           2 ( 5%)        0%
Benchmark 2 (42 runs): ./target/release/examples/blogpost-compress 2 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           120ms ± 1.20ms     119ms …  124ms          1 ( 2%)          -  0.8% ±  1.5%
  peak_rss           24.9MB ± 66.1KB    24.8MB … 25.0MB          0 ( 0%)          +  0.1% ±  0.1%
  cpu_cycles          487M  ± 3.17M      484M  …  503M           4 (10%)          -  1.1% ±  1.5%
  instructions       1.08G  ±  288      1.08G  … 1.08G           0 ( 0%)        💩+  1.0% ±  0.0%
  cache_references    271K  ± 15.9K      264K  …  349K           6 (14%)          +  0.6% ±  2.2%
  cache_misses        230K  ± 10.1K      206K  …  246K           9 (21%)          +  0.7% ±  1.9%
  branch_misses      6.19M  ± 4.53K     6.19M  … 6.21M           0 ( 0%)          +  0.1% ±  0.0%
Benchmark 1 (37 runs): ./blogpost-compress-baseline 3 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           138ms ± 2.21ms     137ms …  148ms          4 (11%)        0%
  peak_rss           24.7MB ± 83.3KB    24.6MB … 24.8MB          0 ( 0%)        0%
  cpu_cycles          566M  ± 3.40M      564M  …  581M           2 ( 5%)        0%
  instructions       1.40G  ±  301      1.40G  … 1.40G           0 ( 0%)        0%
  cache_references    271K  ± 10.9K      265K  …  314K           4 (11%)        0%
  cache_misses        234K  ± 5.31K      215K  …  242K           2 ( 5%)        0%
  branch_misses      7.04M  ± 4.43K     7.04M  … 7.05M           0 ( 0%)        0%
Benchmark 2 (37 runs): ./target/release/examples/blogpost-compress 3 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           137ms ±  867us     136ms …  140ms          2 ( 5%)          -  0.7% ±  0.6%
  peak_rss           24.6MB ± 60.8KB    24.6MB … 24.8MB         11 (30%)          -  0.2% ±  0.1%
  cpu_cycles          563M  ± 2.51M      562M  …  575M           2 ( 5%)          -  0.5% ±  0.2%
  instructions       1.41G  ±  345      1.41G  … 1.41G           0 ( 0%)          +  0.8% ±  0.0%
  cache_references    269K  ± 5.48K      265K  …  291K           2 ( 5%)          -  0.5% ±  1.5%
  cache_misses        232K  ± 7.46K      207K  …  243K           6 (16%)          -  0.8% ±  1.3%
  branch_misses      7.07M  ± 4.28K     7.06M  … 7.08M           0 ( 0%)          +  0.3% ±  0.0%
Benchmark 1 (32 runs): ./blogpost-compress-baseline 4 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           161ms ±  563us     160ms …  163ms          1 ( 3%)        0%
  peak_rss           24.5MB ± 81.9KB    24.3MB … 24.7MB          1 ( 3%)        0%
  cpu_cycles          667M  ±  791K      666M  …  669M           0 ( 0%)        0%
  instructions       1.50G  ±  339      1.50G  … 1.50G           0 ( 0%)        0%
  cache_references    274K  ± 15.2K      264K  …  348K           2 ( 6%)        0%
  cache_misses        232K  ± 7.54K      207K  …  240K           3 ( 9%)        0%
  branch_misses      7.56M  ± 4.19K     7.56M  … 7.57M           0 ( 0%)        0%
Benchmark 2 (32 runs): ./target/release/examples/blogpost-compress 4 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           160ms ±  978us     159ms …  163ms          2 ( 6%)          -  0.6% ±  0.2%
  peak_rss           24.5MB ± 58.0KB    24.4MB … 24.7MB          0 ( 0%)          +  0.0% ±  0.1%
  cpu_cycles          663M  ± 3.79M      660M  …  677M           3 ( 9%)          -  0.6% ±  0.2%
  instructions       1.51G  ±  363      1.51G  … 1.51G           0 ( 0%)          +  0.8% ±  0.0%
  cache_references    275K  ± 20.3K      264K  …  371K           3 ( 9%)          +  0.6% ±  3.3%
  cache_misses        232K  ± 7.68K      207K  …  240K           4 (13%)          -  0.0% ±  1.6%
  branch_misses      7.57M  ± 6.14K     7.56M  … 7.58M           0 ( 0%)          +  0.1% ±  0.0%
Benchmark 1 (28 runs): ./blogpost-compress-baseline 5 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           180ms ± 1.84ms     179ms …  189ms          2 ( 7%)        0%
  peak_rss           24.5MB ± 89.1KB    24.3MB … 24.7MB          2 ( 7%)        0%
  cpu_cycles          751M  ± 6.22M      749M  …  782M           5 (18%)        0%
  instructions       1.72G  ±  319      1.72G  … 1.72G           1 ( 4%)        0%
  cache_references    279K  ± 51.0K      265K  …  537K           3 (11%)        0%
  cache_misses        231K  ± 9.08K      205K  …  240K           4 (14%)        0%
  branch_misses      8.25M  ± 7.67K     8.24M  … 8.27M           0 ( 0%)        0%
Benchmark 2 (29 runs): ./target/release/examples/blogpost-compress 5 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           178ms ± 2.26ms     177ms …  188ms          2 ( 7%)          -  0.8% ±  0.6%
  peak_rss           24.5MB ± 69.1KB    24.3MB … 24.6MB          1 ( 3%)          -  0.1% ±  0.2%
  cpu_cycles          741M  ±  891K      740M  …  743M           0 ( 0%)          -  1.3% ±  0.3%
  instructions       1.73G  ±  334      1.73G  … 1.73G           0 ( 0%)          +  0.7% ±  0.0%
  cache_references    271K  ± 10.7K      265K  …  321K           3 (10%)          -  2.8% ±  7.0%
  cache_misses        233K  ± 9.67K      209K  …  250K           8 (28%)          +  0.8% ±  2.2%
  branch_misses      8.22M  ± 7.87K     8.20M  … 8.24M           3 (10%)          -  0.4% ±  0.1%
Benchmark 1 (23 runs): ./blogpost-compress-baseline 6 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           221ms ±  882us     219ms …  223ms          0 ( 0%)        0%
  peak_rss           24.5MB ± 97.1KB    24.4MB … 24.7MB          0 ( 0%)        0%
  cpu_cycles          926M  ± 1.16M      925M  …  929M           1 ( 4%)        0%
  instructions       1.89G  ±  300      1.89G  … 1.89G           0 ( 0%)        0%
  cache_references    287K  ± 76.4K      266K  …  637K           1 ( 4%)        0%
  cache_misses        234K  ± 6.08K      217K  …  242K           2 ( 9%)        0%
  branch_misses      8.42M  ± 8.76K     8.40M  … 8.43M           0 ( 0%)        0%
Benchmark 2 (23 runs): ./target/release/examples/blogpost-compress 6 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           219ms ±  752us     218ms …  221ms          0 ( 0%)          -  1.1% ±  0.2%
  peak_rss           24.5MB ± 74.7KB    24.4MB … 24.7MB          0 ( 0%)          +  0.0% ±  0.2%
  cpu_cycles          915M  ±  785K      914M  …  917M           0 ( 0%)        ⚡-  1.2% ±  0.1%
  instructions       1.90G  ±  699      1.90G  … 1.90G           1 ( 4%)          +  0.6% ±  0.0%
  cache_references    272K  ± 7.03K      266K  …  293K           2 ( 9%)          -  5.1% ± 11.3%
  cache_misses        236K  ± 2.95K      229K  …  242K           0 ( 0%)          +  0.7% ±  1.2%
  branch_misses      8.40M  ± 7.02K     8.39M  … 8.41M           0 ( 0%)          -  0.2% ±  0.1%
Benchmark 1 (17 runs): ./blogpost-compress-baseline 7 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           306ms ± 1.01ms     305ms …  309ms          1 ( 6%)        0%
  peak_rss           24.4MB ± 83.9KB    24.2MB … 24.5MB          0 ( 0%)        0%
  cpu_cycles         1.28G  ± 1.64M     1.28G  … 1.29G           0 ( 0%)        0%
  instructions       2.28G  ±  328      2.28G  … 2.28G           0 ( 0%)        0%
  cache_references    272K  ± 6.44K      266K  …  295K           4 (24%)        0%
  cache_misses        235K  ± 5.73K      217K  …  239K           2 (12%)        0%
  branch_misses      9.64M  ± 6.96K     9.63M  … 9.65M           0 ( 0%)        0%
Benchmark 2 (17 runs): ./target/release/examples/blogpost-compress 7 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           301ms ± 4.07ms     299ms …  316ms          1 ( 6%)          -  1.6% ±  0.7%
  peak_rss           24.4MB ± 58.7KB    24.3MB … 24.5MB          0 ( 0%)          -  0.1% ±  0.2%
  cpu_cycles         1.26G  ± 6.00M     1.26G  … 1.28G           2 (12%)        ⚡-  1.9% ±  0.2%
  instructions       2.30G  ±  335      2.30G  … 2.30G           2 (12%)          +  0.6% ±  0.0%
  cache_references    289K  ± 41.7K      267K  …  404K           2 (12%)          +  5.9% ±  7.7%
  cache_misses        238K  ± 6.50K      222K  …  251K           3 (18%)          +  1.2% ±  1.8%
  branch_misses      9.58M  ± 14.8K     9.54M  … 9.60M           0 ( 0%)          -  0.7% ±  0.1%
Benchmark 1 (13 runs): ./blogpost-compress-baseline 8 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           400ms ± 2.21ms     394ms …  404ms          3 (23%)        0%
  peak_rss           24.4MB ± 76.2KB    24.3MB … 24.5MB          0 ( 0%)        0%
  cpu_cycles         1.69G  ±  954K     1.69G  … 1.69G           0 ( 0%)        0%
  instructions       2.73G  ±  300      2.73G  … 2.73G           0 ( 0%)        0%
  cache_references    273K  ± 5.98K      265K  …  286K           0 ( 0%)        0%
  cache_misses        235K  ± 4.99K      223K  …  242K           2 (15%)        0%
  branch_misses      9.77M  ± 7.98K     9.76M  … 9.79M           0 ( 0%)        0%
Benchmark 2 (13 runs): ./target/release/examples/blogpost-compress 8 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           394ms ±  503us     393ms …  395ms          0 ( 0%)        ⚡-  1.6% ±  0.3%
  peak_rss           24.4MB ± 71.8KB    24.2MB … 24.5MB          1 ( 8%)          +  0.0% ±  0.2%
  cpu_cycles         1.66G  ± 1.34M     1.66G  … 1.66G           0 ( 0%)        ⚡-  1.7% ±  0.1%
  instructions       2.75G  ±  164      2.75G  … 2.75G           0 ( 0%)          +  0.5% ±  0.0%
  cache_references    277K  ± 16.3K      266K  …  318K           0 ( 0%)          +  1.7% ±  3.6%
  cache_misses        237K  ± 4.41K      229K  …  245K           1 ( 8%)          +  0.7% ±  1.6%
  branch_misses      9.70M  ± 13.0K     9.68M  … 9.73M           0 ( 0%)          -  0.7% ±  0.1%
Benchmark 1 (12 runs): ./blogpost-compress-baseline 9 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           419ms ±  935us     418ms …  421ms          2 (17%)        0%
  peak_rss           24.4MB ± 49.4KB    24.2MB … 24.4MB          4 (33%)        0%
  cpu_cycles         1.76G  ± 1.51M     1.76G  … 1.77G           0 ( 0%)        0%
  instructions       3.35G  ±  313      3.35G  … 3.35G           1 ( 8%)        0%
  cache_references    279K  ± 24.6K      266K  …  355K           1 ( 8%)        0%
  cache_misses        235K  ± 7.01K      221K  …  242K           1 ( 8%)        0%
  branch_misses      15.4M  ± 17.0K     15.4M  … 15.5M           0 ( 0%)        0%
Benchmark 2 (12 runs): ./target/release/examples/blogpost-compress 9 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           422ms ± 1.51ms     420ms …  425ms          0 ( 0%)          +  0.7% ±  0.3%
  peak_rss           24.4MB ± 97.2KB    24.2MB … 24.6MB          0 ( 0%)          +  0.1% ±  0.3%
  cpu_cycles         1.77G  ± 2.27M     1.77G  … 1.78G           0 ( 0%)          +  0.4% ±  0.1%
  instructions       3.38G  ±  375      3.38G  … 3.38G           0 ( 0%)          +  1.0% ±  0.0%
  cache_references    278K  ± 8.24K      268K  …  296K           0 ( 0%)          -  0.1% ±  5.6%
  cache_misses        238K  ± 4.10K      230K  …  245K           0 ( 0%)          +  1.5% ±  2.1%
  branch_misses      15.8M  ± 40.1K     15.7M  … 15.9M           0 ( 0%)        💩+  2.2% ±  0.2%

Copy link
Collaborator

@folkertdev folkertdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@folkertdev folkertdev merged commit f10deb4 into trifectatechfoundation:main May 28, 2025
24 checks passed
@brian-pane brian-pane deleted the cache-layout branch May 28, 2025 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants