-
Notifications
You must be signed in to change notification settings - Fork 5.1k
fixing a regression in regions related to UOH allocations during BGC #118567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…llocate UOH objects wait during a BGC + don't apply that policy that says "if the begin UOH size is 2x the last gen2's UOH end size, always wait" for regions as it doesn't make sense - regions could have released a lot of memory to the UOH free lists in the last gen2 GC so it's not uncommon that the begin UOH size would be a lot larger than last gen2 GC's end size. + this kind of perf fixes can always regress someone. Due to the very different perf characteristics of regions wrt this, it's not practical to keep the original behavior for everyone. So I'm adding a config for scenarios that allocate heavily on UOH therefore can be paused during a BGC. If you're willing to accept larger UOH sizes in exchange for fewer pauses, you can use the UOHWaitBGCSizeIncPercent config to increase the wait ratio. Likewise, set it to use a smaller ratio if you observe that UOH grows too large during BGCs. I will be submitting a doc PR for this. + refactored various things kept separately for LOH and POH into UOH generations. I had a much more complicated fix before and having to update both generations was tedious so I took the opportunity to refactor. I didn't go with that fix (because it regressed edge cases too much and there wasn't a good way to make it work without way riskier changes. But I kept the refactoring changes as it just makes the code cleaner. + fixed how the current UOH size is computed so it's updated correctly (it had a racing condition as `bgc_uoh_alloc_clr` and `adjust_limit_clr` would release the msl). + got rid of the unproductive code in `new_allocation_allowed` and `allocate_uoh`. + misc - fixed an error in a comment related to how free regions are aged. note that it's better to add additional diagnostics info (in a separate PR), to indicate how much the UOH size had changed since the start of that BGC and how much UOH allocation was made during the BGC. Right now the `size_before` we set for UOH is larger than it actually is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a regression in regions where UOH (Unmanaged Object Heap) allocations during a Background GC (BGC) experienced significantly longer wait times compared to the legacy segment implementation. The core issue was an inappropriate policy that would force allocation waits when begin UOH size was 2x the last generation 2's UOH end size - a policy that made sense for segments but not for regions due to their different memory management characteristics.
Key changes include:
- Removing the problematic 2x size check policy for regions while maintaining it for segments
- Adding a new configurable parameter
UOHWaitBGCSizeIncPercent
for fine-tuning allocation behavior during BGC - Refactoring separate LOH and POH handling into unified UOH generation arrays for cleaner code maintenance
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
File | Description |
---|---|
src/coreclr/gc/gcpriv.h | Adds new UOH allocation action enum, consolidates separate LOH/POH fields into UOH arrays, updates comments for region aging |
src/coreclr/gc/gcconfig.h | Introduces new UOHWaitBGCSizeIncPercent configuration parameter |
src/coreclr/gc/gc.cpp | Core implementation changes including removal of legacy allocation logic, new BGC allocation decision logic, and UOH tracking consolidation |
Comments suppressed due to low confidence (4)
Tagging subscribers to this area: @dotnet/gc |
@mangod9 PTAL. |
@Maoni0 Are there any plans to get this into .NET 10 and to backport it to .NET 9? We have removed the worst LOH allocating offenders from our code, but it is impossible to remove everything, which basically means that there will always be a non-zero chance that a request will be blocked for tens of seconds (we cache a lot of things, so our heap is very large). |
@anderspedersen I took a brief look at your article. great debugging! a couple of things -
![]()
|
@Maoni0
|
customer reported much longer/frequent waits on threads allocating UOH objects during a BGC with regions than with segments. this can be observed in the "LOH allocation pause (due to background GC) > 200 msec Events" table in the GCStats view in perfview. this is because we have a policy that says "if the begin UOH size is 2x the last gen2's UOH end size, always wait". for segments this makes sense because segments UOH size is fairly stable. but with regions it's more likely for the end size to be much smaller because regions released a bunch of free regions back to the region free pool.
bgc_uoh_alloc_clr
andadjust_limit_clr
would release the msl).new_allocation_allowed
andallocate_uoh
.note that it's better to add additional diagnostics info (in a separate PR), to indicate how much the UOH size had changed since the start of that BGC and how much UOH allocation was made during the BGC. Right now the
size_before
we set for UOH is larger than it actually is.