Skip to content

Conversation

Maoni0
Copy link
Member

@Maoni0 Maoni0 commented Aug 11, 2025

customer reported much longer/frequent waits on threads allocating UOH objects during a BGC with regions than with segments. this can be observed in the "LOH allocation pause (due to background GC) > 200 msec Events" table in the GCStats view in perfview. this is because we have a policy that says "if the begin UOH size is 2x the last gen2's UOH end size, always wait". for segments this makes sense because segments UOH size is fairly stable. but with regions it's more likely for the end size to be much smaller because regions released a bunch of free regions back to the region free pool.

  • don't apply this particular policy for regions as it doesn't make sense.
  • this kind of perf fixes can always regress someone. Due to the very different perf characteristics of regions wrt this, it's not practical to keep the original behavior for everyone during to the very different perf characteristics between the 2 implementations. So I'm adding a config for scenarios that allocate heavily on UOH therefore can be paused during a BGC. If you're willing to accept larger UOH sizes in exchange for fewer pauses, you can use the UOHWaitBGCSizeIncPercent config to increase the wait ratio. Likewise, set it to use a smaller ratio if you observe that UOH grows too large during BGCs. I will be submitting a doc PR for this.
  • refactored various things kept separately for LOH and POH into UOH generations. I had a much more complicated fix before and having to update both generations was tedious so I took the opportunity to refactor. I didn't go with that fix (because it regressed edge cases too much and there wasn't a good way to make it work without way riskier changes. But I kept the refactoring changes as it just makes the code cleaner.
  • fixed how the current UOH size is computed so it's updated correctly (it had a racing condition as bgc_uoh_alloc_clr and adjust_limit_clr would release the msl).
  • got rid of the unproductive code in new_allocation_allowed and allocate_uoh.
  • misc - fixed an error in a comment related to how free regions are aged.

note that it's better to add additional diagnostics info (in a separate PR), to indicate how much the UOH size had changed since the start of that BGC and how much UOH allocation was made during the BGC. Right now the size_before we set for UOH is larger than it actually is.

…llocate UOH objects wait during a BGC

+ don't apply that policy that says "if the begin UOH size is 2x the last gen2's UOH end size, always wait" for regions as it doesn't make sense - regions could have released a lot of memory to the UOH free lists in the last gen2 GC so it's not uncommon that the begin UOH size would be a lot larger than last gen2 GC's end size.
+ this kind of perf fixes can always regress someone. Due to the very different perf characteristics of regions wrt this, it's not practical to keep the original behavior for everyone. So I'm adding a config for scenarios that allocate heavily on UOH therefore can be paused during a BGC. If you're willing to accept larger UOH sizes in exchange for fewer pauses, you can use the UOHWaitBGCSizeIncPercent config to increase the wait ratio. Likewise, set it to use a smaller ratio if you observe that UOH grows too large during BGCs. I will be submitting a doc PR for this.
+ refactored various things kept separately for LOH and POH into UOH generations. I had a much more complicated fix before and having to update both generations was tedious so I took the opportunity to refactor. I didn't go with that fix (because it regressed edge cases too much and there wasn't a good way to make it work without way riskier changes. But I kept the refactoring changes as it just makes the code cleaner.
+ fixed how the current UOH size is computed so it's updated correctly (it had a racing condition as `bgc_uoh_alloc_clr` and `adjust_limit_clr` would release the msl).
+ got rid of the unproductive code in `new_allocation_allowed` and `allocate_uoh`.
+ misc - fixed an error in a comment related to how free regions are aged.

note that it's better to add additional diagnostics info (in a separate PR), to indicate how much the UOH size had changed since the start of that BGC and how much UOH allocation was made during the BGC. Right now the `size_before` we set for UOH is larger than it actually is.
@Copilot Copilot AI review requested due to automatic review settings August 11, 2025 03:31
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a regression in regions where UOH (Unmanaged Object Heap) allocations during a Background GC (BGC) experienced significantly longer wait times compared to the legacy segment implementation. The core issue was an inappropriate policy that would force allocation waits when begin UOH size was 2x the last generation 2's UOH end size - a policy that made sense for segments but not for regions due to their different memory management characteristics.

Key changes include:

  • Removing the problematic 2x size check policy for regions while maintaining it for segments
  • Adding a new configurable parameter UOHWaitBGCSizeIncPercent for fine-tuning allocation behavior during BGC
  • Refactoring separate LOH and POH handling into unified UOH generation arrays for cleaner code maintenance

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
src/coreclr/gc/gcpriv.h Adds new UOH allocation action enum, consolidates separate LOH/POH fields into UOH arrays, updates comments for region aging
src/coreclr/gc/gcconfig.h Introduces new UOHWaitBGCSizeIncPercent configuration parameter
src/coreclr/gc/gc.cpp Core implementation changes including removal of legacy allocation logic, new BGC allocation decision logic, and UOH tracking consolidation
Comments suppressed due to low confidence (4)

Copy link
Contributor

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

@Maoni0
Copy link
Member Author

Maoni0 commented Aug 11, 2025

tested with the following GCPerfSim cmdlines and env vars -

common cmdline -
GCPerfSim.dll -tc 8 -tagb 500 -tlgb 2 -ramb 20 -rlmb 2 -lohar 300 -pohar 0 -lohsi 0 -sohsi 50 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0
with Server GC, 4 heaps
bseg - baseline segments
breg - baseline regions
fseg - segments with fix (should be the same as bseg)
freg - regions with fix

LOHs0 - no additional config, breg observed many more/longer waits than seg. freg observed no waits.

bseg0 observed    32 waits, total     2.58s, avg wait  80.6ms
bseg1 observed    80 waits, total     9.15s, avg wait 114.4ms
fseg0 observed    96 waits, total    12.31s, avg wait 128.2ms
fseg1 observed    88 waits, total     9.97s, avg wait 113.3ms
breg0 observed   833 waits, total   134.05s, avg wait 160.9ms
breg1 observed   789 waits, total   122.09s, avg wait 154.7ms
freg0 observed     0 waits, total     0.00s, avg wait   0.0ms
freg1 observed     0 waits, total     0.00s, avg wait   0.0ms

breg had a much smaller HeapSizeBefore, freg had a similar HeapSizeBefore as bseg/fseg, note that breg also took noticeably longer.

image

LOHs0P0 - setting DOTNET_UOHWaitBGCSizeIncPercent to 0, basically reverts back to breg (breg had a period of time where it happened to not hit that policy which caused its HeapSizeBefore larger and fewer waits than freg -

bseg0 observed    48 waits, total     4.06s, avg wait  84.5ms
bseg1 observed   216 waits, total    38.98s, avg wait 180.5ms
fseg0 observed    96 waits, total    10.14s, avg wait 105.7ms
fseg1 observed    63 waits, total     6.21s, avg wait  98.6ms
breg0 observed   831 waits, total   129.81s, avg wait 156.2ms
breg1 observed   838 waits, total   134.97s, avg wait 161.1ms
freg0 observed  1122 waits, total   180.63s, avg wait 161.0ms
freg1 observed  1111 waits, total   178.88s, avg wait 161.0ms
image

setting DOTNET_UOHWaitBGCSizeIncPercent to be larger than the default makes no difference as it's already 0 waits with the default.

LOHs20 - setting -lohsi to 20 on the cmdline - breg already doesn't get many waits; freg gets 0 waits with very similar HeapSizeBefore -

bseg0 observed    24 waits, total     1.84s, avg wait  76.6ms
bseg1 observed    32 waits, total     2.44s, avg wait  76.2ms
fseg0 observed    24 waits, total     1.91s, avg wait  79.5ms
fseg1 observed    24 waits, total     1.77s, avg wait  73.7ms
breg0 observed    32 waits, total     2.27s, avg wait  70.9ms
breg1 observed    32 waits, total     2.35s, avg wait  73.3ms
freg0 observed     0 waits, total     0.00s, avg wait   0.0ms
freg1 observed     0 waits, total     0.00s, avg wait   0.0ms
image

because this did occur noticeable heap size inc during a BGC, setting DOTNET_UOHWaitBGCSizeIncPercent to 0 of course has an effect, ie, heap size is a lot smaller with a lot more waits -

bseg0 observed    24 waits, total     1.74s, avg wait  72.6ms
bseg1 observed    32 waits, total     2.65s, avg wait  82.9ms
fseg0 observed    32 waits, total     2.55s, avg wait  79.5ms
fseg1 observed    32 waits, total     2.46s, avg wait  76.8ms
breg0 observed    24 waits, total     1.56s, avg wait  64.8ms
breg1 observed    32 waits, total     2.15s, avg wait  67.3ms
freg0 observed  1051 waits, total   148.01s, avg wait 140.8ms
freg1 observed  1048 waits, total   147.75s, avg wait 141.0ms
image

@Maoni0
Copy link
Member Author

Maoni0 commented Aug 11, 2025

@mangod9 PTAL.

@Maoni0 Maoni0 enabled auto-merge (squash) August 12, 2025 03:48
@Maoni0 Maoni0 merged commit 811da24 into dotnet:main Aug 12, 2025
93 checks passed
@anderspedersen
Copy link

@Maoni0
I am happy to see this issue fixed. I actually wrote about this exact issue a month ago, but I failed to realize that it was a regression caused by switching to regions.

Are there any plans to get this into .NET 10 and to backport it to .NET 9? We have removed the worst LOH allocating offenders from our code, but it is impossible to remove everything, which basically means that there will always be a non-zero chance that a request will be blocked for tens of seconds (we cache a lot of things, so our heap is very large).

@Maoni0
Copy link
Member Author

Maoni0 commented Aug 13, 2025

@anderspedersen I took a brief look at your article. great debugging! a couple of things -

  1. the same exact policy existed for segments as well, so you can see this with segments (the rationale behind this is because during a BGC, we can't collect LOH till the UOH sweep phase which is at the very end of a BGC, we don't want to risk increasing the LOH size by too much). but regions can make this show up more as I described in the bug description.
  2. this problem is actually described as a table in GCStats in perfview 😊you'll see a table that says something like this
image since you seem to enjoy debugging, you might be interested to know that this is based on the BGCAllocWaitStart/End events (see https://github.com/microsoft/perfview/blob/main/src/TraceEvent/Computers/TraceManagedProcess.cs#L1227).
  1. this is in .NET 10 now (it was mistakenly reverted by [main] Source code updates from dotnet/dotnet #118514 which was reverted with Revert backflow and re-apply VMR build #118657 so make sure you don't happen to get a build in-between those 2 PRs if you want to try out a 10.0 daily build). as far as whether this can be backport to 9.0, it would require a customer request - @mangod9, can you please comment on the possibility of backporting this to 9.0?

@anderspedersen
Copy link

@Maoni0
Thanks for taking the time to write a detailed answer 😊

  1. Yes, I know. As part of my original investigation, I checked the commit/PR history to see when this behavior was introduced (git commits/PR history can be a great tool for learning why code is the way it is), so I know the policy existed (at least) since .NET Core branched off from .NET Framework.
    What I meant by “I failed to realize it was a regression caused by switching to regions” is that I didn’t recognize the growing and shrinking of the LOH as new behavior introduced by regions.

  2. Thanks for pointing this out.

  3. Okay. We can wait for .NET 10 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants