-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
libexpr: Implement small string optimization for Value #13895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| static constexpr std::size_t smallStringStorageSize = std::max({ | ||
| #define NIX_VALUE_STORAGE_FIELD_SIZE(T, FIELD_NAME, DISCRIMINATOR) sizeof(T), | ||
| NIX_VALUE_STORAGE_FOR_EACH_FIELD(NIX_VALUE_STORAGE_FIELD_SIZE) | ||
| #undef NIX_VALUE_STORAGE_DEFINE_FIELD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| #undef NIX_VALUE_STORAGE_DEFINE_FIELD | |
| #undef NIX_VALUE_STORAGE_FIELD_SIZE |
(undef'd the wrong thing)
|
🎉 All dependencies have been resolved ! |
| internalType = tSmallString; | ||
| payload.smallString = {}; | ||
| /* Trick is the same as in Facebook's Folly string. Use the last byte | ||
| of the string to store the remaining capacity. This was it naturally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| of the string to store the remaining capacity. This was it naturally | |
| of the string to store the remaining capacity. This way it naturally |
This attempts to store small strings inline in the Value struct. This is possible to do for 64 bit systems that are little endian or on other systems where pointer tagging optimization is not used. Another reason for this change is to start storing the string length explicitly (at least for the small string case for now).
d0f124c to
85ef58c
Compare
|
Unfortunately this doesn't look like a very productive optimization in practice. I'm only seeing a 0.2% memory usage drop on |
|
Do we already do string interning for strings? i.e. x86_64-linux will be very common. |
Only for attribute identifiers, which end up in the symbol table. I once tried an optimization, place ALL strings into the symbol table. This makes GC faster, better sharing, at the cost of some non-GC'd memory. |
Unfortunately this doesn't seem like a very sustainable choice. I'm leaning in the direction of pascal-style strings with ropes (twines). That will help significantly with the implementation of lazy paths (@roberth suggested this at some point IIRC). |
Some attempts:
But these tended to get to end up with bugs that I wasn't the best at resolving. Perhaps a new approach? |
|
I am curious if we can use thunks to get ropes "for free". Basically force each side of the |
|
Another alternative is https://sinusoid.es/immer/containers.html#flex-vector. |
This comment was marked as duplicate.
This comment was marked as duplicate.
Tom shared his WIP branch for this above. Though it's not what we necessarily need. We'd want to have lazy thunk segments to accommodate lazy paths. |
Motivation
This attempts to store small strings inline in the Value
struct. This is possible to do for 64 bit systems that are
little endian or on other systems where pointer tagging optimization is not used.
Another reason for this change is to start storing the string
length explicitly (at least for the small string case for now).
Context
Depends on #13890.
Rebase/reimplementation of #9895.
Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.