Skip to content

Conversation

jkotas
Copy link
Member

@jkotas jkotas commented Jul 13, 2025

Fixes #116276

@Copilot Copilot AI review requested due to automatic review settings July 13, 2025 19:35
@jkotas jkotas requested a review from MichalStrehovsky as a code owner July 13, 2025 19:35
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes the floating-point register unwinding logic across several architectures by restricting valid ranges to the correct registers and replacing manual assertions/assignments with a bitcast helper.

  • Introduces unwindhelpers_bitcast for safe type-punning via memcpy.
  • Updates validFloatRegister, getFloatRegister, and setFloatRegister for ARM, ARM64, Loongarch64, and RISC-V to use the new bitcast and correct register ranges.
  • Removes legacy vector-register handling and replaces PORTABILITY_ASSERT with assert for invalid-register checks.
Comments suppressed due to low confidence (1)

src/coreclr/nativeaot/Runtime/unix/UnwindHelpers.cpp:522

  • Consider adding targeted unit tests for each architecture to verify that all valid floating-point registers are correctly unwound using the new unwindhelpers_bitcast implementation.
    return unwindhelpers_bitcast<double>(D[num - UNW_ARM_D8]);

Copy link
Contributor

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

@risc-vv
Copy link

risc-vv commented Jul 13, 2025

@dotnet/samsung Could you please take a look? These changes may be related to riscv64.

@jkotas
Copy link
Member Author

jkotas commented Jul 13, 2025

The unwinder tried to treat floating point registers as vector registers. It led to all sorts of problems with storage size (8 byte storage for double vs. 16 byte storage for vector registers).

This bug was originally introduced by dotnet/corert#8290 . It is surprising that it took years to uncover it.

The bug got copied from arm64 unwinder to arm, riscv and loongarch unwinders in various forms. I have fixed those as well.

@risc-vv
Copy link

risc-vv commented Jul 13, 2025

RISC-V Release-CLR-QEMU: 9083 / 9113 (99.67%)
=======================
      passed: 9083
      failed: 2
     skipped: 597
      killed: 28
------------------------
 TOTAL tests: 9710
VIRTUAL time: 37h 33min 25s 850ms
   REAL time: 38min 15s 882ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-CLR-VF2: 9084 / 9114 (99.67%)
=======================
      passed: 9084
      failed: 2
     skipped: 597
      killed: 28
------------------------
 TOTAL tests: 9711
VIRTUAL time: 11h 58min 52s 922ms
   REAL time: 48min 22s 894ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-FX-QEMU: 283771 / 284850 (99.62%)
=======================
      passed: 283771
      failed: 1070
     skipped: 39
      killed: 9
------------------------
 TOTAL tests: 284889
VIRTUAL time: 32h 18min 24s 649ms
   REAL time: 1h 10min 32s 31ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-FX-VF2: 309260 / 311009 (99.44%)
=======================
      passed: 309260
      failed: 1741
     skipped: 39
      killed: 8
------------------------
 TOTAL tests: 311048
VIRTUAL time: 21h 18min 26s 682ms
   REAL time: 2h 10min 49s 503ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

Build information and commands

GIT: a76fcf3a5593a61df9555e5e1202244b46d29877
CI: d6c9c1ab3a7411819463edc05ded301e89ba586a
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

@jkotas
Copy link
Member Author

jkotas commented Jul 13, 2025

/azp run runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@filipnavara
Copy link
Member

filipnavara commented Jul 13, 2025

Wow, amazing work on getting to the bottom of it!

(I guess I'll need to re-read the DWARF specs to figure out how platforms with overlapping vector and FP registers should behave. I assume that most platforms don't save the high bits of vector registers but I am not sure if that's universally true.)

@jkotas
Copy link
Member Author

jkotas commented Jul 13, 2025

I assume that most platforms don't save the high bits of vector registers but I am not sure if that's universally true.

Right, it is the case for default calling conventions of all platforms that we support currently.

@filipnavara
Copy link
Member

It rechecked the specs and seems to be fine for ARM64 and LA64.

RV64 ratified an optional vector calling convention last year (https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc#calling-convention-variant) so we may eventually need to support that. However, that was already broken prior to this PR and I don't have any hardware or toolchain that supports this. (cc @am11 FYI)

@am11
Copy link
Member

am11 commented Jul 14, 2025

However, that was already broken prior to this PR

I think we are not emitting RVV types from JIT and neither are we compiling with rv64gcv (yet).

I don't have any hardware or toolchain that supports this.

Going by https://godbolt.org/z/d43s88nfE, at least it seems to know how to pass RVV types in v‑register.

@risc-vv
Copy link

risc-vv commented Jul 14, 2025

RISC-V Release-CLR-VF2: 9083 / 9113 (99.67%)
=======================
      passed: 9083
      failed: 2
     skipped: 597
      killed: 28
------------------------
 TOTAL tests: 9710
VIRTUAL time: 11h 5min 47s 145ms
   REAL time: 45min 18s 492ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-FX-QEMU: 283858 / 284941 (99.62%)
=======================
      passed: 283858
      failed: 1074
     skipped: 39
      killed: 9
------------------------
 TOTAL tests: 284980
VIRTUAL time: 32h 34min 23s 945ms
   REAL time: 1h 11min 0s 439ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

Build information and commands

GIT: 5543649c55213eef586b31081a059b12ee0af99f
CI: d6c9c1ab3a7411819463edc05ded301e89ba586a
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

@jkotas
Copy link
Member Author

jkotas commented Jul 14, 2025

RV64 ratified an optional vector calling convention last year (https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc#calling-convention-variant) so we may eventually need to support that.

It is in the same category as #8300 or #5040 .

Also, RISCV vector extension is variable length like ARM SVE, so I expect we would want to finish implementing ARM SVE first and then base RISCV vector extension on that.

@jkotas jkotas requested a review from janvorli July 14, 2025 05:56
@jkotas
Copy link
Member Author

jkotas commented Jul 14, 2025

/azp run runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@filipnavara
Copy link
Member

I think we are not emitting RVV types from JIT and neither are we compiling with rv64gcv (yet).

Right. I don't think there's currently a code path that could hit it.

It is in the same category as #8300 or #5040 .

I was thinking more in the terms of unwinding native code like the GC poll code path. We likely cannot hit any vectorized code there (yet).

Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@jkotas
Copy link
Member Author

jkotas commented Jul 15, 2025

/ba-g known Android timeout

@jkotas jkotas merged commit 6123c24 into dotnet:main Jul 15, 2025
111 of 119 checks passed
@jkotas jkotas deleted the fix-116276 branch July 15, 2025 12:09
@github-actions github-actions bot locked and limited conversation to collaborators Aug 15, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TimeSpan overflowed because the duration is too long.
5 participants