Skip to content

Improving the Diagonal Calculation #21

@Turnerj

Description

@Turnerj

A few points:

  • Diagonal calculation isn't effective with its use of caching vector-sized strings, specifically the target vector which needs to be flipped. It flips it multiple times because of how the data iterates. If the diagonal calculation could be both diagonal within a column the width of the vector, it would eliminate this.
  • Threading doesn't make use of the diagonal calculation because the boundaries cannot get the correct data easily (both the back boundary for the next thread and the forward boundary for the current thread). Using the opposite of the technique above, instead of diagonal within a column the width of the vector, do diagonal in the row the width of the vector.

Preliminary calculations on performance for single threaded improvement (take these with a grain of salt, they all don't calculate correctly):

No target lookups and no shuffles/permutations aka. Process Column (at best - we would still need one of these calls per vector but we would avoid a lot of duplicates)

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Quickenshtein 23.67 ms 0.125 ms 0.117 ms - - - -

No source lookups aka. Process Row (at best - we would still need one of these calls per vector but we would avoid a lot of duplicates)

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Quickenshtein 24.50 ms 0.143 ms 0.134 ms - - - 40 B

No source or target lookups (INVALID)

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Quickenshtein 21.20 ms 0.138 ms 0.130 ms - - - 40 B

For comparison, it takes about 26.8 ms to calculate it currently with the regular diagonal calculation. This means at best, single threaded could have a 12% improvement however it is likely to be closer to 8% for the "correct" solution.

No idea about how the threading code would perform though.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions