Skip to content

Conversation

@topolarity
Copy link
Member

@topolarity topolarity commented Jul 5, 2024

This effectively expands our existing union ABI to cover both of these existing cases:

  • sret ABI (which can stack-allocate a single pointer-ful type)
  • union ABI (which can stack-allocate many pointer-free types)

This provides some nice speed-ups for temporary "wrappers":

const v = Any[]
@noinline maybe_wrapped(i) = (i % 32 != 0) ? Some(v) : nothing
function foo()
    count = 0
    for i = 1:1_000_000
        count += (maybe_wrapped(i) !== nothing) ? 1 : 0
    end
    return count
end

On this PR this gives:

julia> @btime foo()
  1.675 ms (0 allocations: 0 bytes)
968750

compared to current master:

julia> @btime foo()
  6.877 ms (968750 allocations: 14.78 MiB)
968750

TODO:

The most outstanding TODO here is what to do about ϕ-nodes. Right now, if the incoming Union{...} type has a pointer-containing type then this change forces the incoming object to be boxed, even if the object at run-time is actually pointer-free.

But that's just a band-aid so the code works - it introduces new boxes where we didn't have them before, which is a regression that almost certainly needs to be fixed before landing this.

@JeffBezanson JeffBezanson added performance Must go faster compiler:codegen Generation of LLVM IR and native code labels Jul 5, 2024
@topolarity topolarity force-pushed the ct/union-sret-abi branch 2 times, most recently from 3ad2529 to e33c792 Compare July 5, 2024 18:10
vtjnash added a commit that referenced this pull request Sep 4, 2024
@vtjnash vtjnash force-pushed the ct/union-sret-abi branch 3 times, most recently from 20bc368 to 1f8932f Compare September 23, 2025 21:01
@topolarity topolarity force-pushed the ct/union-sret-abi branch 5 times, most recently from 0bd445a to 9c008c1 Compare October 16, 2025 15:33
@topolarity topolarity force-pushed the ct/union-sret-abi branch 3 times, most recently from 3153817 to f6762fa Compare October 24, 2025 20:29
@topolarity topolarity force-pushed the ct/union-sret-abi branch 3 times, most recently from 5530839 to eec16be Compare November 7, 2025 20:54

// bitcast whatever Ptr kind x might be (even if it is part of a union) into Ptr{Cvoid}
// given that the caller already had emit_cpointercheck on this branch, so that
// the conversion is guaranteed to be valid on this runtiem branch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// the conversion is guaranteed to be valid on this runtiem branch
// the conversion is guaranteed to be valid on this runtime branch

@vtjnash

This comment was marked as outdated.

@vtjnash

This comment was marked as outdated.

@vtjnash vtjnash changed the title WIP: implement sret_union ABI for pointer-ful types implement sret_union ABI for pointer-ful types Nov 17, 2025
topolarity and others added 3 commits November 17, 2025 17:19
This is a combination the existing:
 - `sret`  ABI (which can stack-allocate a _single_ pointerful type)
 - `union` ABI (which can stack-allocate many _pointer-free_ types)

This provides some nice speed-ups for temporary "wrappers":
```julia
const v = Any[]
@noinline maybe_wrapped(i) = (i % 32 != 0) ? Some(v) : nothing
function foo()
    count = 0
    for i = 1:1_000_000
        count += (maybe_wrapped(i) !== nothing) ? 1 : 0
    end
    return count
end
```

On this PR this gives:
```julia
julia> @Btime foo()
  1.675 ms (0 allocations: 0 bytes)
968750
```

compared to current master:
```julia
julia> @Btime foo()
  6.877 ms (968750 allocations: 14.78 MiB)
968750
```

The most outstanding TODO here is what to do about PHI nodes. Right now,
if the incoming `Union{...}` type has a pointer-containing type then the
object is forced to be boxed, even if the object at run-time is actually
pointer-free.

But that's just a band-aid - it means we introduce new boxes where we
didn't have them before, which is a regression that almost certainly to
be fixed before landing this.

Co-authored-by: Gabriel Baraldi <[email protected]>
always need to call typeassert / update type
topolarity and others added 4 commits November 17, 2025 17:19
A concrete-typed `cgval_t` can only be statically boxed or unboxed,
while a `union-split` value can dynamically be either.
This variable was accidentally dropped in the change to use the
"julia.return_roots" attribute.
… where the next crash seems likely to be caused by
@vtjnash
Copy link
Member

vtjnash commented Nov 18, 2025

Hopefully getting closer now. Still reaching a segfault trying to observe the parent field in this offsetvector after the phic reload:

julia> isdefined(Main, :OffsetArrays) || @eval Main include("testhelpers/OffsetArrays.jl"); using .Main.OffsetArrays

julia> testf() = for a in @noinline (["foo", "Bar"], OffsetVector(["foo", "Bar"], 0:1))
               try; error(); catch; end
               @noinline a == a
           end
testf (generic function with 1 method)

julia> testf()

[3998236] signal 11 (1): Segmentation fault
in expression starting at REPL[6]:1
size at ./essentials.jl:10 [inlined]
axes at ./abstractarray.jl:102 [inlined]
axes at /home/vtjnash/julia/test/testhelpers/OffsetArrays.jl:504 [inlined]
== at ./abstractarray.jl:3049
testf at ./REPL[5]:3

@KristofferC KristofferC added the needs pkgeval Tests for all registered packages should be run with this change label Nov 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

compiler:codegen Generation of LLVM IR and native code needs pkgeval Tests for all registered packages should be run with this change performance Must go faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants