Skip to content

Conversation

@amanasifkhalid
Copy link
Contributor

Given some branch curr -> succ, fgSplitEdge introduces an intermediary block to create the form curr -> newBlock -> succ. The lexical placement of newBlock wouldn't matter if it weren't for the fact that we call fgSplitEdge during LSRA, after we've reordered blocks. Ideally, we wouldn't introduce any new blocks after establishing layout, but for the time being, always placing newBlock after curr to create fallthrough -- effectively moving the curr -> succ branch up a block, and not introducing new branches -- seems to be a decent compromise.

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 5, 2024
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@amanasifkhalid
Copy link
Contributor Author

cc @dotnet/jit-contrib, @AndyAyersMS PTAL. Diffs are large, though they're inflated by collections with tiering (libraries_tests in particular). This change in placement is particularly impactful when we aren't optimizing, so we can't rely on block layout to fix things later -- though it looks like there are plenty of instances where LSRA introduces a new block after layout when we are optimizing, and this new placement is better at maintaining fallthrough.

@AndyAyersMS
Copy link
Member

Seems like perhaps fgSplitEdge should behave differently before/after layout? Before the placement is not important, after, it is...?

That being said, I have seen those LSRA blocks end up very poorly placed, and we might question whether after source or before target is the right location for the new block, when source and target blocks are not adjacent, or whether we just (as we've discussed) redo layout if LSRA makes changes.

@amanasifkhalid
Copy link
Contributor Author

I suppose we could differentiate between before/after layout, though since we're seeing this block placement affect FullOpts with some frequency, it might be more worthwhile to get layout working after LSRA.

That being said, I have seen those LSRA blocks end up very poorly placed, and we might question whether after source or before target is the right location for the new block

Between the two, I think placing after source is better most of the time? If the target has multiple preds, placing before target could potentially break up hotter fallthrough. If the source has only one successor, then placing after the source doesn't change anything, but if the source is a BBJ_COND, then the worst-case scenario is we had some layout like source, falseTarget, target, and now we have source, newBlock, falseTarget, target; we no longer have fallthrough into either successor of source.

If you'd like, I can put this PR on-hold and work on getting layout working after LSRA. For that, I'm thinking we can continue to run layout before LSRA, and detect if we need to run it again after, just to minimize regressions. Once we've refactored switch recognition to not depend on lexical layout (#107076), we should be able to only run layout once, after LSRA.

@AndyAyersMS
Copy link
Member

I think it makes sense to look into fixing layout post LSRA first. If that turns out to be complicated, then perhaps we can reconsider and do this first.

I believe LSRA will only split "critical edges"— edges where the source has multiple flow successors and the target has multiple flow predecessors. If we trust the initial layout then if source and target are adjacent then putting the block in between is the right thing; if not, presumably both source and target have other preferred partners and if those partners are adjacent then the new block should be placed out of the way somewhere where it won't break up any other important flow (though not necessarily at the end of the method / region, which is what I commonly see). If the preferred source or target partner is not adjacent (or maybe if no successor/predecessor is adjacent), then we should be placing after source or before target depending.

This calculus would be easier if we hand the flow scoring that we are envisioning for k-opt, as we could evaluate the different options and pick the one that has the best score.

@amanasifkhalid
Copy link
Contributor Author

With #107483 merged, the diffs are still big, though FullOpts diffs are concentrated in coreclr_tests and libraries_tests. After looking at the JIT dumps for a few examples, I see that fgSplitEdge's block placement during LSRA, as expected, is no longer meaningful in FullOpts, as block layout handles it. But the call site during profile incorporation is now the source of diffs, thanks to various early phases having dependencies on lexical block ordering. fgUpdateFlowGraph is one obvious culprit -- I can look into refactoring that next.

@AndyAyersMS are you ok taking this change as-is, or would you prefer we whittle down the churn as much as possible?

@Copilot Copilot AI review requested due to automatic review settings August 5, 2025 19:26
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR modifies the JIT's fgSplitEdge method to better maintain fallthrough when splitting edges during LSRA (Linear Scan Register Allocation). The key improvement is placing the new intermediary block more strategically to preserve existing control flow patterns and avoid introducing unnecessary branches.

  • Improves block placement strategy in fgSplitEdge to maintain fallthrough when possible
  • Simplifies weight calculation logic by using existing predecessor edge weight directly
  • Updates documentation to reflect the cleaner implementation

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/coreclr/jit/fgbasic.cpp Refactors fgSplitEdge to place new blocks more strategically and simplifies weight calculation
src/coreclr/jit/lsra.cpp Updates comment to reflect simplified edge splitting behavior

@amanasifkhalid
Copy link
Contributor Author

I decided to take another look at this, now that our block layout story is mature. The diffs show large size decreases for both optimized and unoptimized code, though the bulk of these savings are from instrumented tiers. Here's the breakdown:

  • LSRA calls fgSplitEdge to resolve critical edges. When optimizing, the placement of the new block doesn't matter since block layout will fix it, unless the block is cold, in which case layout ignores it entirely. Thus, placing the new block strategically can yield more compact cold code.
  • On x64 in Tier1-Instrumented, there are several instances where we now call CORINFO_HELP_COUNTPROFILE32 with a 32-bit argument instead of a 64-bit one, and vice versa. These diffs look spurious, since they have more to do with the profile data's location in memory than the code layout.
  • fgSplitEdge is also called when inserting instrumentation probes. In Tier0-Instrumented, since blocks won't be reordered, any fallthrough created by fgSplitEdge will be reflected by the final layout.

@AndyAyersMS the churn might be too large to take at this point, but I think this is worth taking at some point, considering the potential size savings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants