Skip to content

Conversation

@SingleAccretion
Copy link
Contributor

@SingleAccretion SingleAccretion commented Jan 12, 2024

This introduces the INITIAL_HEAP setting originally discussed in #20888 to Emscripten. This setting is intended to supersede INITIAL_MEMORY, which requires all consumers of the Emscripten toolchain to guesstimate its intended value. E. g. this was just recently hit again in dotnet/runtime#96840, dotnet/runtime#97634.

This change uses INITIAL_HEAP by default in the following configurations:

  1. Where memory is created in WASM in exported to JS. Opposite cases range from fundamentally unsupportable (user-supplied memory object) to not yet implemented due to my lack of familiarity with the code. Thoughts on how to fix the latter are very welcome.
  2. Where the user specified neither INITIAL_MEMORY nor MAXIMUM_MEMORY. Since INITIAL_HEAP adds to the static data and stack, and those settings place an upper bound on the size of initial memory, enabling INITIAL_HEAP for users who specify these settings would be a breaking change, with the possible consequence of a compilation failure.

Since INITIAL_HEAP is additive, using it by default increases the initial memory by "size of stack + size of static data" (due to backwards compatibility, we cannot set default INITIAL_HEAP lower than the previous default for INITIAL_MEMORY). I do not expect this increase to be noticeable in typical applications, but in certain edge cases (no memory growth, very large static data), it can be. In the worst case of an application using the whole 16MB for static data, this would double the memory usage. We can consider only enabling INITIAL_HEAP in ALLOW_MEMORY_GROWTH=1 builds if we consider it unacceptable to regress these kinds of applications.

Another effect of INITIAL_HEAP-by-default is initial memory for ALLOW_MEMORY_GROWTH=0 address-sanitized builds. Due to the uncertainty of static data size, the code now falls back to estimating using MAXIMUM_MEMORY, which results in the same experience (of large initial memory usage) as with ALLOW_MEMORY_GROWTH=1. Here, we can consider falling back to INITIAL_MEMORY if the user has this configurations.

@SingleAccretion SingleAccretion force-pushed the Initial-Heap branch 4 times, most recently from 6be67f3 to eb88bb2 Compare January 14, 2024 21:29
Changes in default behavior:
1) INITIAL_HEAP is the new default for most builds.
   This means that there is an increase in the effective
   initial memory used by "sizeof(stack) + sizeof(static data)".
   In typical small applications this should be on the order
   of half a megabyte.
2) Because we cannot precisely calculate the amount
   of initial memory now, ASAN support will use
   the conservative upper estimate of MAXIMUM_MEMORY.
   This only affects ALLOW_MEMORY_GROWTH=0 builds.

This change does not yet enable INITIAL_HEAP for builds
that instantiate the memory in JS, e. g. with threading.
It seems this wasn't working for not-MINIMAL_RUNTIME
for similar reasons for some time already.
The tests hardcode the initial memory amount.
Make them robust against changes in defaults.
1 byte increase due to this diff:

-  (memory (;0;) 256 256)
+  (memory (;0;) 258 32768)

We no longer cap the maximum memory.
@SingleAccretion SingleAccretion marked this pull request as ready for review January 16, 2024 11:41
@SingleAccretion
Copy link
Contributor Author

test-browser-chrome-wasm64 failures look preexisting (unrelated run).

@sbc100

tools/link.py Outdated
settings.INITIAL_HEAP = -1
else:
# Some of these could (and should) be implemented.
exit_with_error('INITIAL_HEAP is currently not compatible with IMPORTED_MEMORY, SHARED_MEMORY, RELOCATABLE, ASYNCIFY_LAZY_LOAD_CODE, WASM2JS')
Copy link
Contributor Author

@SingleAccretion SingleAccretion Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. IMPORTED_MEMORY - this is just harmless, but I didn't want to complicate this further trying to detect if IMPORTED_MEMORY is implicit or not.
  2. SHARED_MEMORY - ?.
  3. RELOCATABLE - not sure how the memory scheme here works. Do the side modules import the main module's memory?
  4. ASYNCIFY_LAZY_LOAD_CODE - ?.
  5. WASM2JS - apparently it is a code size regression to define memory in WASM. It is also a legacy mode.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a new error? Right now I can do emcc -sINITIAL_MEMORY=32mb -sIMPORTED_MEMORY hello.c without any error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error is under if 'INITIAL_HEAP' in user_settings, so it should only apply if you try to use INITIAL_HEAP with an unsupported (unimplemented) configuration.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I'm just confirming that that was not previously an error.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about INITIAL_HEAP is currently not compatible with imported memoy?

Also, I think WASM2JS should still supported INITIAL_HEAP even through the memory is technically imported (since we still pragmatically create the memory with the correct size). Can you explain why this is not possible?

I wonder if we should switch wasm2js to not use imported memory before we land this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about INITIAL_HEAP is currently not compatible with imported memory?

I think it would technically work, but it's a bit misleading. There are two cases:

  1. IMPORTED_MEMORY is set because the user set it, and they want to provide the memory object themselves. In this case, both INITIAL_MEMORY and INITIAL_HEAP are basically meaningless. The only reason this scenario is blocked in this change is for simplicity: I didn't want to write code to detect whether IMPORTED_MEMORY is set by the user, or implicitly, or both. I presume that explicit IMPORTED_MEMORY is rare.
  2. IMPORTED_MEMORY is set as an internal implementation detail of Emscripten, where it is convenient/necessary to instantiate the memory object in JS. 'Leaking' that into documentation would seem odd.

Also, I think WASM2JS should still supported INITIAL_HEAP even through the memory is technically imported (since we still pragmatically create the memory with the correct size). Can you explain why this is not possible?

I believe it is possible to switch all INITIAL_MEMORY cases to INITIAL_HEAP and delete the error. It is just work I have not done yet. For it to be done, I basically need to understand what are the reasons for implicit IMPORTED_MEMORY in all the cases that it is set, and what is the optimal solution in each case.

For WASM2JS specifically, it seems like there is a code size optimization in place where doing imported memory "manually" saves code size compared to the code generated by the wasm2js tool for WASM modules that define their own memory. So here the fix could be to improve the wasm2js tool, or read out the defined memory size from the linked WASM binary (this could be the solution to other cases too), or something else.

@SingleAccretion
Copy link
Contributor Author

...ping...

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remind me again of the use cases for INITIAL_MEMORY once we have this new setting?

Should we at least start to recommend this setting over INITIAL_MEMORY and try to push folks towards using it?

Can you add a ChangeLog entry?

Otherwise lgtm with some comments.

if expected_initial_heap != 0:
self.assertContained('--initial-heap=' + str(expected_initial_heap), out.stderr)
else:
self.assertNotContained('--initial-heap=', out.stderr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, I'd prefer not to test this by peeking inside like this but I guess there is no other easy way?

I suppose we could compare __heap_base with __builtin_wasm_memory_size within the code? But that might not work for == 0 case here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was an intentional decision do this sort of "white box" testing. There were two reasons:

  1. It's easier. As you mention, it is somewhat non-obvious how to test the 0 case.
  2. It's composable in a sense that it tests only the functionality we're adding (i. e. we're not adding the feature that lays out a memory in a particular way, we're adding a feature that makes it such that a particular argument is passed to the linker).

The upside is that it is more resilient against changes in the other parts of the system. The downside is that it assumes the process involves invoking the linker, but it seems unlikely that the linker will go away any time soon (interestingly, writing a test with __heap_base would also assume a very particular linker implementation).

@SingleAccretion
Copy link
Contributor Author

Can you remind me again of the use cases for INITIAL_MEMORY once we have this new setting?

Two cases:

  1. Backwards compatibility.
  2. Unimplemented cases with IMPORTED_MEMORY.

The latter is the bigger of the two of course, as it prevents using the setting with e. g. threading.

Should we at least start to recommend this setting over INITIAL_MEMORY and try to push folks towards using it?

I had some reservations about it initially, due to the above issue, but have gone ahead and added some text to this effect.

@SingleAccretion
Copy link
Contributor Author

Looks like test-browser-chrome failure is again pre-existing (example).

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.. even though I'm a little sad to add more complexity here.

@kripken WDYT?

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with a couple of final comments.

Note that this will ignore user-supplied MAXIMUM_MEMORY when
growth is disabled. This is consistent with existing behavior.
Implementing ALLOW_MEMORY_GROWTH=0 properly revealed
an invalid assumption in the test: that MAXIMUM_MEMORY
is respected under ALLOW_MEMORY_GROWTH=0 (default).

Since this is not true, pass -sALLOW_MEMORY_GROWTH=1
alongside MAXIMUM_MEMORY.
@sbc100 sbc100 changed the title Initial INITIAL_HEAP support Initial new INITIAL_HEAP setting Mar 1, 2024
@sbc100 sbc100 enabled auto-merge (squash) March 1, 2024 18:31
@sbc100 sbc100 merged commit 22a1c04 into emscripten-core:main Mar 1, 2024
@SingleAccretion
Copy link
Contributor Author

@sbc100 thank you for all the support in getting this through!

@sbc100
Copy link
Collaborator

sbc100 commented Mar 1, 2024

@sbc100 thank you for all the support in getting this through!

Thanks for contributing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants