Skip to content

Conversation

@xokdvium
Copy link
Contributor

@xokdvium xokdvium commented Sep 2, 2025

Motivation

This attempts to store small strings inline in the Value
struct. This is possible to do for 64 bit systems that are
little endian or on other systems where pointer tagging optimization is not used.

Another reason for this change is to start storing the string
length explicitly (at least for the small string case for now).

Context

Depends on #13890.
Rebase/reimplementation of #9895.


Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

@xokdvium xokdvium requested a review from tomberek September 2, 2025 00:51
@github-actions github-actions bot added the new-cli Relating to the "nix" command label Sep 2, 2025
@xokdvium xokdvium marked this pull request as draft September 2, 2025 00:55
static constexpr std::size_t smallStringStorageSize = std::max({
#define NIX_VALUE_STORAGE_FIELD_SIZE(T, FIELD_NAME, DISCRIMINATOR) sizeof(T),
NIX_VALUE_STORAGE_FOR_EACH_FIELD(NIX_VALUE_STORAGE_FIELD_SIZE)
#undef NIX_VALUE_STORAGE_DEFINE_FIELD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#undef NIX_VALUE_STORAGE_DEFINE_FIELD
#undef NIX_VALUE_STORAGE_FIELD_SIZE

(undef'd the wrong thing)

@dpulls
Copy link

dpulls bot commented Sep 2, 2025

🎉 All dependencies have been resolved !

internalType = tSmallString;
payload.smallString = {};
/* Trick is the same as in Facebook's Folly string. Use the last byte
of the string to store the remaining capacity. This was it naturally
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
of the string to store the remaining capacity. This was it naturally
of the string to store the remaining capacity. This way it naturally

This attempts to store small strings inline in the Value
struct. This is possible to do for 64 bit systems that are
little endian or on other systems where pointer tagging optimization is not used.

Another reason for this change is to start storing the string
length explicitly (at least for the small string case for now).
@xokdvium
Copy link
Contributor Author

xokdvium commented Sep 5, 2025

Unfortunately this doesn't look like a very productive optimization in practice. I'm only seeing a 0.2% memory usage drop on nix-env --query --available --out-path --file ../nixpkgs --eval-system x86_64-linux. This is a drop in the bucket and there's a lot that we can do better.

@Mic92
Copy link
Member

Mic92 commented Sep 6, 2025

Do we already do string interning for strings? i.e. x86_64-linux will be very common.

@tomberek
Copy link
Contributor

tomberek commented Sep 7, 2025

string interning

Only for attribute identifiers, which end up in the symbol table. I once tried an optimization, place ALL strings into the symbol table. This makes GC faster, better sharing, at the cost of some non-GC'd memory.

@xokdvium
Copy link
Contributor Author

xokdvium commented Sep 7, 2025

place ALL strings into the symbol table

Unfortunately this doesn't seem like a very sustainable choice. I'm leaning in the direction of pascal-style strings with ropes (twines). That will help significantly with the implementation of lazy paths (@roberth suggested this at some point IIRC).

@tomberek
Copy link
Contributor

tomberek commented Sep 7, 2025

Unfortunately this doesn't seem like a very sustainable choice. I'm leaning in the direction of pascal-style strings with ropes (twines). That will help significantly with the implementation of lazy paths (@roberth suggested this at some point IIRC).

Some attempts:

But these tended to get to end up with bugs that I wasn't the best at resolving. Perhaps a new approach?

@Ericson2314
Copy link
Member

I am curious if we can use thunks to get ropes "for free". Basically force each side of the + or ++ but then just leave the primop in place.

@xokdvium
Copy link
Contributor Author

xokdvium commented Sep 7, 2025

Another alternative is https://sinusoid.es/immer/containers.html#flex-vector.

@Mic92

This comment was marked as duplicate.

@xokdvium
Copy link
Contributor Author

xokdvium commented Sep 9, 2025

Interestingly our gc library also has this as a feature:

Tom shared his WIP branch for this above. Though it's not what we necessarily need. We'd want to have lazy thunk segments to accommodate lazy paths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new-cli Relating to the "nix" command

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants