Skip to content

Optimizing the node location store #151

@joto

Description

@joto

Profiling shows that the sparse_mmap_array node location store is faster than others for small and medium datasets, but it still takes a huge amount of processing time. This is probably due to the binary search over a large array of data that totally defeats the CPU cache, basically every memory access is a cache fail.

This could probably be optimized. Here are some ideas:

  • Store node IDs and locations not as pair of data but in separate memory areas. Find ID first, which gives us the offset into the array and then get location with one lookup.
  • Store node IDs not as full 64bit but use several arrays with a subset.
  • Have some kind of compact lookup table with pointers into main table, to narrow down search space.
  • Do linear search instead of binary search when we are "near" the ID we are looking for, for better cache performance.
  • Remember position of last ID(s) we looked up in a mini-cache as starting point for next search. Because node IDs in ways are often close, this could be a huge speedup.

And here some background reading:

We have to try different ideas and benchmark them.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions