Skip to content

Add prefixes to regex terminals #161

@jahav

Description

@jahav

I am working on a replacement of a handcrafted parser for library ClosedXml (a library to manipulate xlsx files) with the XLParser. Because xlsx can have hundreds of thousands of formulas, I would like to improve performance of XLParser.

I would like to add prefixes to the RegexBasedTerminals.

Irony uses a first character prefixes to build a table of a char->possible terminals (terminals without prefix are always considered). This table is then used to speedup a calculation of current terminal.

I would also like to change grammar to be case sensitive (small, but measurable improvement) and terminals already use both cases, where necessary (e.g. a-zA-Z).

I have tried to change regex options of the terminals (through reflection) - RegexOptions.ExplicitCapture (as recommended in best practices), RegexOptions.Compiled, RegexOptions.CultureInvariant . but there wasn't significant improvements.

I have run a benchmark on EnronFormulasParseTest (test was modified to be single threaded). Parser version with prefixes runs 44% faster.

BenchmarkDotNet=v0.13.2, OS=Windows 11 (10.0.22000.856/21H2)
AMD Ryzen 5 5500U with Radeon Graphics, 1 CPU, 12 logical and 6 physical cores
.NET SDK=6.0.302
  [Host]     : .NET 6.0.7 (6.0.722.32202), X64 RyuJIT AVX2
  Job-JFUIAS : .NET 6.0.7 (6.0.722.32202), X64 RyuJIT AVX2

IterationCount=3  LaunchCount=1  WarmupCount=1

With prefixes

Method Mean Error StdDev
EnronDataSet 26.496 s 3.6721 s 0.2013 s
EusesFormulasParseTest 2.852 s 0.0582 s 0.0032 s

Without prefixes

Method Mean Error StdDev
EnronDataSet 47.295 s 2.5500 s 0.1398 s
EusesFormulasParseTest 4.738 s 0.3636 s 0.0199 s

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions