-
-
Notifications
You must be signed in to change notification settings - Fork 28
Optimize out some read operations from the fast deflate algorithm #375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 4 files with indirect coverage changes 🚀 New features to boost your workflow:
|
Notes:
|
|
/* Find the longest match, discarding those <= prev_length. | ||
* At this point we have always match length < WANT_MIN_MATCH | ||
*/ | ||
// Find the longest match for the string starting at offset state.strstart. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did we lose the property that
At this point we have always match length < WANT_MIN_MATCH
I'd assume not, in which case I'd like to keep that bit of the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the new version, match_length
doesn't even exist until after that point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, the diff made that hard to see. Now the check for the length is right below the definition, so that will do.
let mut bflush; /* set if current block must be flushed */ | ||
let mut dist; | ||
let mut match_len = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Historically these variables were pre-declared as an optimization I think. But I do buy that it's not needed any more and in fact having the scope be more restricted might help in re-using registers/stack slots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
excellent work, these are my local numbers
Benchmark 2 (41 runs): target/release/examples/blogpost-compress 2 rs silesia-small.tar
measurement mean ± σ min … max outliers delta
wall_time 125ms ± 1.51ms 123ms … 132ms 1 ( 2%) ⚡- 2.9% ± 0.6%
peak_rss 24.9MB ± 73.2KB 24.8MB … 25.0MB 0 ( 0%) - 0.0% ± 0.2%
cpu_cycles 500M ± 5.04M 493M … 523M 2 ( 5%) ⚡- 3.6% ± 0.6%
instructions 1.08G ± 379 1.08G … 1.08G 0 ( 0%) - 0.6% ± 0.0%
cache_references 33.7M ± 339K 33.3M … 34.8M 1 ( 2%) + 0.2% ± 0.5%
cache_misses 1.14M ± 181K 794K … 1.79M 1 ( 2%) + 8.9% ± 10.5%
branch_misses 6.26M ± 3.11K 6.26M … 6.27M 2 ( 5%) - 0.1% ± 0.0%
with all of the other levels having no significant changes.
/* Find the longest match, discarding those <= prev_length. | ||
* At this point we have always match length < WANT_MIN_MATCH | ||
*/ | ||
// Find the longest match for the string starting at offset state.strstart. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, the diff made that hard to see. Now the check for the length is right below the definition, so that will do.
No description provided.