Skip to content

Conversation

@qinsoon
Copy link
Member

@qinsoon qinsoon commented Oct 12, 2021

This PR adds a new API call disable_collection(), which can be used in pair with enable_collection() to temporarily allow allocating without triggering a GC. This closes #457.

@qinsoon qinsoon added the PR-testing Run binding tests for the pull request (deprecated: use PR-extended-testing instead) label Oct 12, 2021
@qinsoon qinsoon marked this pull request as ready for review October 18, 2021 23:31
@qinsoon qinsoon requested a review from caizixian October 18, 2021 23:31
@qinsoon qinsoon requested review from wks and removed request for caizixian October 25, 2021 02:15
Copy link
Collaborator

@wks wks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think we now have two boolean states:

  1. "initialized" vs "uninitialized", i.e. initialized
  2. "enabled" vs "disabled", i.e. trigger_gc_when_heap_is_full

Now we may have a state where is_initialized()==false but is_gc_enabled() == true. This is a bit confusing. enable_collection and disable_collection seem to be switching both Boolean variables. It is confusing, too. I think it is worth renaming those functions.

@qinsoon
Copy link
Member Author

qinsoon commented Oct 25, 2021

Overall, I think we now have two boolean states:

  1. "initialized" vs "uninitialized", i.e. initialized
  2. "enabled" vs "disabled", i.e. trigger_gc_when_heap_is_full

Now we may have a state where is_initialized()==false but is_gc_enabled() == true. This is a bit confusing. enable_collection and disable_collection seem to be switching both Boolean variables. It is confusing, too. I think it is worth renaming those functions.

is_initialized()==false and is_gc_enabled() == true is intentionally allowed. For NoGC, it seems reasonable that the binding does not need to initialize the collection (initialized == false). We need to properly report OOM if they exceed the heap size (enabled == true/trigger_gc_when_heap_is_full == true). See #214.

Do you think it will be more clear if we separate initialize_collection() from enable/disable_collection()? Do you have more specific suggestions?

@wks
Copy link
Collaborator

wks commented Oct 25, 2021

is_initialized()==false and is_gc_enabled() == true is intentionally allowed. For NoGC, it seems reasonable that the binding does not need to initialize the collection (initialized == false). We need to properly report OOM if they exceed the heap size (enabled == true/trigger_gc_when_heap_is_full == true). See #214.

Do you think it will be more clear if we separate initialize_collection() from enable/disable_collection()? Do you have more specific suggestions?

I think it is worth the separation. TL;DR: My suggestion is mandating initialize_collection(), and also provide a pair of enable/disable_collection() APIs.

There are multiple reasons why VMs may not want "GC" to happen.

  1. The VM only uses NoGC, so it doesn't make sense to enable GC. Creating the scheduler and starting GC threads will waste resources.
  2. The VM does not have necessary metadata ready (such as object maps, i.e. "Klass"es in OpenJDK). If GC happens at that time, it will cause an error.
  3. The VM or the application does not want "GC to happen at certain time". Nobody knows why they do that. Maybe the programmers are told that disabling GC will result in 90% performance gain. No matter what the programmers expect, we just do what they asked for, i.e. "not doing GC", whatever "not doing GC" means.
  4. Some researchers want to know how a GC algorithm perform when GC is never triggered. We just need a switch to disable certain GC activities in certain GC algorithms (nursery GC, full GC, defrag, or all of them). It is a white-box approach.

I don't think any VM would use MMTk but only ever use NoGC. Usually, a VM will allow the user to select a GC algorithm (-XX:+UseXxxxGC). So I prefer providing the VM with a algorithm-agnostic interface. For example, "the VM should always call function xxxx after doing yyyy" instead of "if the VM only uses NoGC, it doesn't need to call function xxxx". Of course MMTk may choose not to create GC scheduler and worker threads if it detects NoGC is selected.

Among the four reasons listed above, I think only (2) is related to correctness. The VM must tell MMTk when it is okay to do GC. But I think we need to be careful here. So far, our GC algorithms only scan objects during some explicit "GC" periods (nursery GC, full GC, etc.). I suppose there are other GC algorithms that need to read GC metadata in barriers, too. Naive RC, for example, frees objects recursively after DEC-ing an object to 0. Maybe such algorithms are stupid. I don't know, but I think it is better to rename such an API function to object_scanning_ready or something like that, telling MMTk exactly what is ready after that point of time. Of course MMTk cannot do GC when the VM is not ready for object scanning. But if barriers (maybe slow paths only) depend on object scanning, we have to bypass the barriers, too, until the VM says it is ready.

As for (3), it is hard to classify what exactly is a "GC". Deferred RC and RC-Immix can recycle objects without even partially scanning the heap. Do MMTk developers and application programmers count that as "GC"? However, in the case of (4), it basically gives the user fine-grained control to GC details.

I think this PR is to fix issue #457 which address (3), i.e. implementing Julia's enable/disable GC API. I think disabling "triggering GC when unable to acquire more pages" is good enough for Julia. For (4), we may leave that for a different PR, or maybe the researchers would prefer directly modifying the source code.

For #214, why don't we do what JikesRVM MMTk does? In JikesRVM, if Plan.isInitialized() == false, it will not do the polling, but if it runs out of memory, it will eventually fail, too, after failing to obtain new pages from PageResource.

@qinsoon qinsoon removed the PR-testing Run binding tests for the pull request (deprecated: use PR-extended-testing instead) label Oct 26, 2021
@qinsoon
Copy link
Member Author

qinsoon commented Oct 26, 2021

jikesrvm-binding-test
JIKESRVM_BINDING_REF=update-initialize-collection

@qinsoon
Copy link
Member Author

qinsoon commented Oct 26, 2021

openjdk-binding-test
OPENJDK_BINDING_REF=update-initialize-collection

@qinsoon
Copy link
Member Author

qinsoon commented Oct 26, 2021

v8-binding-test
V8_BINDING_REF=update-initialize-collection

@qinsoon qinsoon added the PR-testing Run binding tests for the pull request (deprecated: use PR-extended-testing instead) label Oct 26, 2021
Copy link
Collaborator

@wks wks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we have four states w.r.t. initialized (VM initialzing GC) and gc_enabled (user enable/disabling GC).

  1. initialized == false && gc_enabled == true: VM has just started. When heap is full, it should panic.
  2. initialized == true && gc_enabled == true: VM initialized GC, and is probably running app now. When heap is full, it should do GC.
  3. initialized == true && gc_enabled == false: The user explicitly disabled GC. When heap is full, it should keep allocating until running out of physical memory.
  4. initialized == false && gc_enabled == false: This state is rare, but still reachable. During VM booting process, the VM may disable GC so that it may temporarily exceed heap size. Then it initializes the collector and enables GC, and the heap shrinks back to the specified heap size. In this state, it should also keep allocating until running out of physical memory.

Another thing worth further discussion is whether we still allow user-triggered GC (System.gc() or jl_gc_collect()) when the user explicitly disabled GC.

@qinsoon
Copy link
Member Author

qinsoon commented Oct 26, 2021

I have pushed my changes. Can you check if those changes addressed your comments and if there are other required changes? @wks

Copy link
Collaborator

@wks wks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now.

@qinsoon qinsoon merged commit 0ededb3 into master Oct 27, 2021
@qinsoon qinsoon deleted the disable-collection branch October 27, 2021 04:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR-testing Run binding tests for the pull request (deprecated: use PR-extended-testing instead)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support disable_collection()

4 participants