Skip to content

Conversation

jayvdb
Copy link
Contributor

@jayvdb jayvdb commented Sep 23, 2020

0, 1, many principle for avoiding unnecessary instantiation of
ConcatenatedSequence and CombinatoricsSequence which only
act as placeholders.

This improves the benchmark results, and the overhead of additional
branches early performs well under typical workloads.

@jayvdb
Copy link
Contributor Author

jayvdb commented Sep 23, 2020

This is obviously a byproduct of the cases in the benchmark, optimising the most obvious parts of the worst cases. Building a more diverse benchmarking dataset would help validate that the introduced penalty is worth it for most usage.

@thatch
Copy link
Contributor

thatch commented Sep 23, 2020

Performance isn't the primary goal; did this change things a lot?

FWIW the cases like (?:(?:[a-z]{,100}){,100}){,100} are indeed contrived to ensure that cachingseq does its job; we initially computed all the lengths and I believe this one would actually OOM a 24GB system. I'm willing to accept a few percent of optimization left on the table in exchange for cleaner code.

@jayvdb
Copy link
Contributor Author

jayvdb commented Sep 23, 2020

The bench results are not very stable for me, but I see 15% performance improvement for some of them, and in my own workloads I see that scale of improvement but I havent done any timing yet.

While the performance is nice, the primary driver for me is that I am trying to simplify the internal state of sre_yield so I can use it for post-processing.

I've also got a fairly decent simplifier for the SRE parse tree at https://github.com/jayvdb/sre-tools/blob/develop/sre_tools/_simplify.py . It doesnt perform well yet.

@jayvdb
Copy link
Contributor Author

jayvdb commented Sep 24, 2020

If you like, I can rebase this PR on top of #32 to show the changes to the internal structures to the baseline.

@jayvdb jayvdb mentioned this pull request Sep 24, 2020
0, 1, many principle for avoiding unnecessary instantiation of
ConcatenatedSequence and CombinatoricsSequence which only
act as placeholders.

This improves the benchmark results, and the overhead of additional
branches early performs well under typical workloads.
@jayvdb
Copy link
Contributor Author

jayvdb commented Sep 27, 2020

Rebased and repr tests updated to illustrate the internal structural simplification and its ramifications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants