Skip to content

Commit 81fb8ec

Browse files
committed
Add support for ARM
1 parent ba5bb38 commit 81fb8ec

File tree

6 files changed

+320
-9
lines changed

6 files changed

+320
-9
lines changed

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ libc = "0.2.14"
1515
optional = true
1616
version = "1.0.0"
1717

18-
[dev-dependencies]
18+
[target.'cfg(any(target_arch = "x86", target_arch = "x86_64", target_arch = "aarch64"))'.dev-dependencies]
1919
simd = "0.1"
2020

2121
[features]

README.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ there should be at least 8 KiB of free stack space, or panicking will result in
118118

119119
## Limitations
120120

121-
The architectures currently supported are: x86, x86_64, aarch64, or1k.
121+
The architectures currently supported are: x86, x86_64, aarch64, arm, or1k.
122122

123123
The platforms currently supported are: bare metal, Linux (any libc),
124124
FreeBSD, DragonFly BSD, macOS.
@@ -176,13 +176,15 @@ of callee-saved registers.
176176

177177
### Call stack splicing
178178

179-
Non-Windows platforms use [DWARF][] for both stack unwinding and debugging. DWARF call frame
180-
information is very generic to be ABI-agnostic—it defines a bytecode that describes the actions
181-
that need to be performed to simulate returning from a function. libfringe uses this bytecode
182-
to specify that, after the generator function has returned, execution continues at the point
183-
where the generator function was resumed the last time.
179+
Non-Windows platforms use [DWARF][] (or the highly similar [ARM EHABI][ehabi]) for both stack
180+
unwinding and debugging. DWARF call frame information is very generic to be ABI-agnostic—
181+
it defines a bytecode that describes the actions that need to be performed to simulate
182+
returning from a function. libfringe uses this bytecode to specify that, after the generator
183+
function has returned, execution continues at the point where the generator function was
184+
resumed the last time.
184185

185186
[dwarf]: http://dwarfstd.org
187+
[ehabi]: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0038b/IHI0038B_ehabi.pdf
186188

187189
## Windows compatibility
188190

src/arch/arm.rs

Lines changed: 304 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,304 @@
1+
// This file is part of libfringe, a low-level green threading library.
2+
// Copyright (c) Nathan Zadoks <[email protected]>,
3+
// whitequark <[email protected]>
4+
// Amanieu d'Antras <[email protected]>
5+
// Licensed under the Apache License, Version 2.0, <LICENSE-APACHE or
6+
// http://apache.org/licenses/LICENSE-2.0> or the MIT license <LICENSE-MIT or
7+
// http://opensource.org/licenses/MIT>, at your option. This file may not be
8+
// copied, modified, or distributed except according to those terms.
9+
10+
// To understand the machine code in this file, keep in mind these facts:
11+
// * ARM AAPCS ABI passes the first argument in r0. We also use r0 to pass a value
12+
// while swapping context; this is an arbitrary choice
13+
// (we clobber all registers and could use any of them) but this allows us
14+
// to reuse the swap function to perform the initial call.
15+
//
16+
// To understand the ARM EHABI CFI code in this file, keep in mind these facts:
17+
// * CFI is "call frame information"; a set of instructions to a debugger or
18+
// an unwinder that allow it to simulate returning from functions. This implies
19+
// restoring every register to its pre-call state, as well as the stack pointer.
20+
// * CFA is "call frame address"; the value of stack pointer right before the call
21+
// instruction in the caller. Everything strictly below CFA (and inclusive until
22+
// the next CFA) is the call frame of the callee. This implies that the return
23+
// address is the part of callee's call frame.
24+
// * Logically, ARM EHABI CFI is a table where rows are instruction pointer values and
25+
// columns describe where registers are spilled (mostly using expressions that
26+
// compute a memory location as CFA+n). A .save pseudoinstruction changes
27+
// the state of a column for all IP numerically larger than the one it's placed
28+
// after. A .pad or .setfp pseudoinstructions change the CFA value similarly.
29+
// * Simulating return is as easy as restoring register values from the CFI table
30+
// and then setting stack pointer to CFA.
31+
//
32+
// A high-level overview of the function of the trampolines is:
33+
// * The 2nd init trampoline puts a controlled value (written in swap to `new_cfa`)
34+
// into r11. This is then used as the CFA for the 1st trampoline.
35+
// * This controlled value points to the bottom of the stack of the parent context,
36+
// which holds the saved r11 and lr from the call to swap().
37+
// * The 1st init trampoline tells the unwinder to restore r11 and lr
38+
// from the stack frame at r11 (in the parent stack), thus continuing
39+
// unwinding at the swap call site instead of falling off the end of context stack.
40+
use core::mem;
41+
use arch::StackPointer;
42+
use unwind;
43+
44+
pub const STACK_ALIGNMENT: usize = 8;
45+
46+
pub unsafe fn init(stack_base: *mut u8, f: unsafe fn(usize, StackPointer)) -> StackPointer {
47+
#[cfg(not(target_vendor = "apple"))]
48+
#[naked]
49+
unsafe extern "C" fn trampoline_1() {
50+
asm!(
51+
r#"
52+
# gdb has a hardcoded check that rejects backtraces where frame addresses
53+
# do not monotonically decrease. It is turned off if the function is called
54+
# "__morestack" and that is hardcoded. So, to make gdb backtraces match
55+
# the actual unwinder behavior, we call ourselves "__morestack" and mark
56+
# the symbol as local; it shouldn't interfere with anything.
57+
__morestack:
58+
.local __morestack
59+
60+
# Set up the first part of our ARM EHABI CFI linking stacks together. When
61+
# we reach this function from unwinding, r11 will be pointing at the bottom
62+
# of the parent linked stack. This link is set each time swap() is called.
63+
# When unwinding the frame corresponding to this function, a ARM EHABI unwinder
64+
# will use r11+16 as the next call frame address, restore return address (lr)
65+
# from CFA-8 and restore r11 from CFA-16. This mirrors what the second half
66+
# of `swap_trampoline` does.
67+
# .setfp fp, sp
68+
# .save {fp, lr}
69+
.cfi_def_cfa fp, 8
70+
.cfi_offset lr, -4
71+
.cfi_offset fp, -8
72+
73+
# This nop is here so that the initial swap doesn't return to the start
74+
# of the trampoline, which confuses the unwinder since it will look for
75+
# frame information in the previous symbol rather than this one. It is
76+
# never actually executed.
77+
nop
78+
79+
.Lend:
80+
.size __morestack, .Lend-__morestack
81+
"#
82+
: : : : "volatile")
83+
}
84+
85+
#[cfg(target_vendor = "apple")]
86+
#[naked]
87+
unsafe extern "C" fn trampoline_1() {
88+
asm!(
89+
r#"
90+
# Identical to the above, except avoids .local/.size that aren't available on Mach-O.
91+
__morestack:
92+
.private_extern __morestack
93+
# .setfp fp, sp
94+
# .save {fp, lr}
95+
.cfi_def_cfa fp, 8
96+
.cfi_offset lr, -4
97+
.cfi_offset fp, -8
98+
nop
99+
"#
100+
: : : : "volatile")
101+
}
102+
103+
#[naked]
104+
unsafe extern "C" fn trampoline_2() {
105+
asm!(
106+
r#"
107+
# Set up the second part of our ARM EHABI CFI.
108+
# When unwinding the frame corresponding to this function, a DWARF unwinder
109+
# will restore r11 (and thus CFA of the first trampoline) from the stack slot.
110+
# This stack slot is updated every time swap() is called to point to the bottom
111+
# of the stack of the context switch just switched from.
112+
# .setfp fp, sp
113+
# .save {fp, lr}
114+
.cfi_def_cfa fp, 8
115+
.cfi_offset lr, -4
116+
.cfi_offset fp, -8
117+
118+
# This nop is here so that the return address of the swap trampoline
119+
# doesn't point to the start of the symbol. This confuses gdb's backtraces,
120+
# causing them to think the parent function is trampoline_1 instead of
121+
# trampoline_2.
122+
nop
123+
124+
# Call unwind_wrapper with the provided function and the stack base address.
125+
add r2, sp, #16
126+
ldr r3, [sp, #8]
127+
bl ${0}
128+
129+
# Restore the stack pointer of the parent context. No CFI adjustments
130+
# are needed since we have the same stack frame as trampoline_1.
131+
ldr sp, [sp]
132+
133+
# Load frame and instruction pointers of the parent context.
134+
pop {fp, lr}
135+
.cfi_adjust_cfa_offset -8
136+
.cfi_restore fp
137+
.cfi_restore lr
138+
139+
# If the returned value is nonzero, trigger an unwind in the parent
140+
# context with the given exception object.
141+
cmp r0, #0
142+
bne ${1}
143+
144+
# Clear the stack pointer. We can't call into this context any more once
145+
# the function has returned.
146+
mov r1, #0
147+
148+
# Return into the new context. Use `r12` instead of `lr` to avoid
149+
# return address mispredictions.
150+
mov r12, lr
151+
bx r12
152+
"#
153+
:
154+
: "s" (unwind::unwind_wrapper as usize)
155+
"s" (unwind::start_unwind as usize)
156+
: : "volatile")
157+
}
158+
159+
// We set up the stack in a somewhat special way so that to the unwinder it
160+
// looks like trampoline_1 has called trampoline_2, which has in turn called
161+
// swap::trampoline.
162+
//
163+
// There are 2 call frames in this setup, each containing the return address
164+
// followed by the r11 value for that frame. This setup supports unwinding
165+
// using DWARF CFI as well as the frame pointer-based unwinding used by tools
166+
// such as perf or dtrace.
167+
let mut sp = StackPointer::new(stack_base);
168+
169+
sp.push(0 as usize); // Padding to ensure the stack is properly aligned
170+
sp.push(f as usize); // Function that trampoline_2 should call
171+
172+
// Call frame for trampoline_2. The CFA slot is updated by swap::trampoline
173+
// each time a context switch is performed.
174+
sp.push(trampoline_1 as usize + 4); // Return after the nop
175+
sp.push(0xdead0cfa); // CFA slot
176+
177+
// Call frame for swap::trampoline. We set up the r11 value to point to the
178+
// parent call frame.
179+
let frame = sp.offset(0);
180+
sp.push(trampoline_2 as usize + 4); // Entry point, skip initial nop
181+
sp.push(frame as usize); // Pointer to parent call frame
182+
183+
sp
184+
}
185+
186+
#[inline(always)]
187+
pub unsafe fn swap_link(arg: usize, new_sp: StackPointer,
188+
new_stack_base: *mut u8) -> (usize, Option<StackPointer>) {
189+
let ret: usize;
190+
let ret_sp: usize;
191+
asm!(
192+
r#"
193+
# Set up the link register
194+
adr lr, 0f
195+
196+
# Save the frame pointer and link register; the unwinder uses them to find
197+
# the CFA of the caller, and so they have to have the correct value immediately
198+
# after the call instruction that invoked the trampoline.
199+
push {fp, lr}
200+
201+
# Pass the stack pointer of the old context to the new one.
202+
mov r1, sp
203+
204+
# Link the call stacks together by writing the current stack bottom
205+
# address to the CFA slot in the new stack.
206+
str sp, [r3, #-16]
207+
208+
# Load stack pointer of the new context.
209+
mov sp, r2
210+
211+
# Load frame and instruction pointers of the new context.
212+
pop {fp, r12}
213+
214+
# Return into the new context. Use `r12` instead of `lr` to avoid
215+
# return address mispredictions.
216+
bx r12
217+
218+
0:
219+
"#
220+
: "={r0}" (ret)
221+
"={r1}" (ret_sp)
222+
: "{r0}" (arg)
223+
"{r2}" (new_sp.0)
224+
"{r3}" (new_stack_base)
225+
:/*r0, r1,*/ "r2", "r3", "r4", "r5", "r6", "r7",
226+
"r8", "r9", "r10",/*r11,*/"r12",/*sp,*/ "lr", /*pc,*/
227+
"d0", "d1", "d2", "d3", "d4", "d5", "d6", "d7",
228+
"d8", "d9", "d10", "d11", "d12", "d13", "d14", "d15",
229+
"d16", "d17", "d18", "d19", "d20", "d21", "d22", "d23",
230+
"d24", "d25", "d26", "d27", "d28", "d29", "d30", "d31",
231+
"cc", "memory"
232+
: "volatile");
233+
(ret, mem::transmute(ret_sp))
234+
}
235+
236+
#[inline(always)]
237+
pub unsafe fn swap(arg: usize, new_sp: StackPointer) -> (usize, StackPointer) {
238+
// This is identical to swap_link, but without the write to the CFA slot.
239+
let ret: usize;
240+
let ret_sp: usize;
241+
asm!(
242+
r#"
243+
adr lr, 0f
244+
push {fp, lr}
245+
mov r1, sp
246+
mov sp, r2
247+
pop {fp, r12}
248+
bx r12
249+
0:
250+
"#
251+
: "={r0}" (ret)
252+
"={r1}" (ret_sp)
253+
: "{r0}" (arg)
254+
"{r2}" (new_sp.0)
255+
:/*r0, r1,*/ "r2", "r3", "r4", "r5", "r6", "r7",
256+
"r8", "r9", "r10",/*r11,*/"r12",/*sp,*/ "lr", /*pc,*/
257+
"d0", "d1", "d2", "d3", "d4", "d5", "d6", "d7",
258+
"d8", "d9", "d10", "d11", "d12", "d13", "d14", "d15",
259+
"d16", "d17", "d18", "d19", "d20", "d21", "d22", "d23",
260+
"d24", "d25", "d26", "d27", "d28", "d29", "d30", "d31",
261+
"cc", "memory"
262+
// We need the "alignstack" attribute here to ensure that the stack is
263+
// properly aligned if a call to start_unwind needs to be injected into
264+
// our stack context.
265+
: "volatile", "alignstack");
266+
(ret, mem::transmute(ret_sp))
267+
}
268+
269+
#[inline(always)]
270+
pub unsafe fn unwind(new_sp: StackPointer, new_stack_base: *mut u8) {
271+
// Argument to pass to start_unwind, based on the stack base address.
272+
let arg = unwind::unwind_arg(new_stack_base);
273+
274+
// This is identical to swap_link, except that it performs a tail call to
275+
// start_unwind instead of returning into the target context.
276+
asm!(
277+
r#"
278+
adr lr, 0f
279+
push {fp, lr}
280+
str sp, [r3, #-16]
281+
mov sp, r2
282+
pop {fp, r12}
283+
284+
# Jump to the start_unwind function, which will force a stack unwind in
285+
# the target context. This will eventually return to us through the
286+
# stack link.
287+
b ${0}
288+
289+
0:
290+
"#
291+
:
292+
: "s" (unwind::start_unwind as usize)
293+
"{r0}" (arg)
294+
"{r2}" (new_sp.0)
295+
"{r3}" (new_stack_base)
296+
: "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7",
297+
"r8", "r9", "r10",/*r11,*/"r12",/*sp,*/ "lr", /*pc,*/
298+
"d0", "d1", "d2", "d3", "d4", "d5", "d6", "d7",
299+
"d8", "d9", "d10", "d11", "d12", "d13", "d14", "d15",
300+
"d16", "d17", "d18", "d19", "d20", "d21", "d22", "d23",
301+
"d24", "d25", "d26", "d27", "d28", "d29", "d30", "d31",
302+
"cc", "memory"
303+
: "volatile");
304+
}

src/arch/mod.rs

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ use core::nonzero::NonZero;
1313
#[cfg_attr(target_arch = "x86", path = "x86.rs")]
1414
#[cfg_attr(target_arch = "x86_64", path = "x86_64.rs")]
1515
#[cfg_attr(target_arch = "aarch64", path = "aarch64.rs")]
16+
#[cfg_attr(target_arch = "arm", path = "arm.rs")]
1617
#[cfg_attr(target_arch = "or1k", path = "or1k.rs")]
1718
mod imp;
1819

@@ -40,6 +41,7 @@ impl StackPointer {
4041
#[cfg(test)]
4142
mod tests {
4243
extern crate test;
44+
#[cfg(any(target_arch = "x86", target_arch = "x86_64", target_arch = "aarch64"))]
4345
extern crate simd;
4446

4547
use arch::{self, StackPointer};
@@ -66,6 +68,7 @@ mod tests {
6668
}
6769
}
6870

71+
#[cfg(any(target_arch = "x86", target_arch = "x86_64", target_arch = "aarch64"))]
6972
#[test]
7073
fn context_simd() {
7174
unsafe fn permuter(arg: usize, stack_ptr: StackPointer) {

src/arch/x86.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -343,7 +343,7 @@ pub unsafe fn unwind(new_sp: StackPointer, new_stack_base: *mut u8) {
343343

344344
asm!(
345345
r#"
346-
call ${0:c}@plt
346+
call ${0:c}
347347
"#
348348
:
349349
: "s" (trampoline as usize)

src/unwind.rs

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,9 @@ fn have_cross_stack_unwind() -> bool {
2828
// for now.
2929
// - iOS on ARM uses setjmp/longjmp instead of DWARF-2 unwinding, which needs
3030
// to be explicitly saved/restored when switching contexts.
31-
!(cfg!(windows) || cfg!(all(target_os = "ios", target_arch = "arm")))
31+
// - LLVM doesn't currently support ARM EHABI directives in inline assembly so
32+
// we instead need to propagate exceptions manually across contexts.
33+
!(cfg!(windows) || cfg!(target_arch = "arm"))
3234
}
3335

3436
// Wrapper around the root function of a generator which handles unwinding.

0 commit comments

Comments
 (0)