-
Notifications
You must be signed in to change notification settings - Fork 5.1k
[NativeAOT] Fix floating pointer register unwinding #117588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes the floating-point register unwinding logic across several architectures by restricting valid ranges to the correct registers and replacing manual assertions/assignments with a bitcast helper.
- Introduces
unwindhelpers_bitcast
for safe type-punning viamemcpy
. - Updates
validFloatRegister
,getFloatRegister
, andsetFloatRegister
for ARM, ARM64, Loongarch64, and RISC-V to use the new bitcast and correct register ranges. - Removes legacy vector-register handling and replaces
PORTABILITY_ASSERT
withassert
for invalid-register checks.
Comments suppressed due to low confidence (1)
src/coreclr/nativeaot/Runtime/unix/UnwindHelpers.cpp:522
- Consider adding targeted unit tests for each architecture to verify that all valid floating-point registers are correctly unwound using the new
unwindhelpers_bitcast
implementation.
return unwindhelpers_bitcast<double>(D[num - UNW_ARM_D8]);
Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas |
@dotnet/samsung Could you please take a look? These changes may be related to riscv64. |
The unwinder tried to treat floating point registers as vector registers. It led to all sorts of problems with storage size (8 byte storage for double vs. 16 byte storage for vector registers). This bug was originally introduced by dotnet/corert#8290 . It is surprising that it took years to uncover it. The bug got copied from arm64 unwinder to arm, riscv and loongarch unwinders in various forms. I have fixed those as well. |
RISC-V Release-CLR-QEMU: 9083 / 9113 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-CLR-VF2: 9084 / 9114 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-QEMU: 283771 / 284850 (99.62%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-VF2: 309260 / 311009 (99.44%)
report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
/azp run runtime-nativeaot-outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
Wow, amazing work on getting to the bottom of it! (I guess I'll need to re-read the DWARF specs to figure out how platforms with overlapping vector and FP registers should behave. I assume that most platforms don't save the high bits of vector registers but I am not sure if that's universally true.) |
Right, it is the case for default calling conventions of all platforms that we support currently. |
It rechecked the specs and seems to be fine for ARM64 and LA64. RV64 ratified an optional vector calling convention last year (https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc#calling-convention-variant) so we may eventually need to support that. However, that was already broken prior to this PR and I don't have any hardware or toolchain that supports this. (cc @am11 FYI) |
I think we are not emitting RVV types from JIT and neither are we compiling with
Going by https://godbolt.org/z/d43s88nfE, at least it seems to know how to pass RVV types in v‑register. |
RISC-V Release-CLR-VF2: 9083 / 9113 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-QEMU: 283858 / 284941 (99.62%)
report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
It is in the same category as #8300 or #5040 . Also, RISCV vector extension is variable length like ARM SVE, so I expect we would want to finish implementing ARM SVE first and then base RISCV vector extension on that. |
/azp run runtime-nativeaot-outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
Right. I don't think there's currently a code path that could hit it. I was thinking more in the terms of unwinding native code like the GC poll code path. We likely cannot hit any vectorized code there (yet). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
/ba-g known Android timeout |
Fixes #116276