Optimizing the node location store

Profiling shows that the `sparse_mmap_array` node location store is faster than others for small and medium datasets, but it still takes a huge amount of processing time. This is probably due to the binary search over a large array of data that totally defeats the CPU cache, basically every memory access is a cache fail.

This could probably be optimized. Here are some ideas:
- Store node IDs and locations not as pair of data but in separate memory areas. Find ID first, which gives us the offset into the array and then get location with one lookup.
- Store node IDs not as full 64bit but use several arrays with a subset.
- Have some kind of compact lookup table with pointers into main table, to narrow down search space.
- Do linear search instead of binary search when we are "near" the ID we are looking for, for better cache performance.
- Remember position of last ID(s) we looked up in a mini-cache as starting point for next search. Because node IDs in ways are often close, this could be a huge speedup. 

And here some background reading:
- https://geidav.wordpress.com/2013/12/29/optimizing-binary-search/
- http://www.cs.utexas.edu/~pingali/CS395T/2013fa/lectures/MemoryOptimizations_2013.pdf
- http://blog.teamleadnet.com/2014/06/beating-binary-search-algorithm.html
- http://bannalia.blogspot.de/2015/06/cache-friendly-binary-search.html
- http://cglab.ca/~morin/misc/arraylayout/

We have to try different ideas and benchmark them.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Optimizing the node location store #151

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Optimizing the node location store #151

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions