Skip to content

Conversation

@mrdomino
Copy link
Contributor

This reverts commit 5784b13.

@mrdomino
Copy link
Contributor Author

Going to see if I can trace down the hang on CI since I am having a hard time reproducing it locally.

@codecov
Copy link

codecov bot commented Nov 26, 2025

Codecov Report

❌ Patch coverage is 95.65217% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 79.88%. Comparing base (e9f0efd) to head (1f8f1a2).

Files with missing lines Patch % Lines
src/uint/rand.rs 95.45% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1018      +/-   ##
==========================================
+ Coverage   79.86%   79.88%   +0.01%     
==========================================
  Files         163      163              
  Lines       17737    17740       +3     
==========================================
+ Hits        14166    14171       +5     
+ Misses       3571     3569       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tarcieri
Copy link
Member

tarcieri commented Nov 27, 2025

Yeah, this is really weird, I couldn't reproduce it outside CI

$ uname -a
Linux instance-20251126-235717 6.12.48+deb13-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.12.48-1 (2025-09-20) x86_64 GNU/Linux
$ cross test --target aarch64-unknown-linux-gnu --release
info: syncing channel updates for 'stable-x86_64-unknown-linux-gnu'

  stable-x86_64-unknown-linux-gnu unchanged - rustc 1.91.1 (ed61e7d7e 2025-11-07)

info: checking for self-update
    Finished `release` profile [optimized] target(s) in 0.06s
     Running unittests src/lib.rs (/target/aarch64-unknown-linux-gnu/release/deps/crypto_bigint-3e7ac94d7ecc1145)

running 421 tests
test const_choice::tests::from_u64_lsb ... ok
test const_choice::tests::from_wide_word_le ... ok
test const_choice::tests::from_word_gt ... ok
test const_choice::tests::from_word_lt ... ok
[...]
test src/uint/div.rs - uint::div::Uint<LIMBS>::wrapping_rem_vartime (line 186) ... ok

test result: ok. 19 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.03s
$

@mrdomino
Copy link
Contributor Author

Yeah, indeed. And it passes on --no-default-features.

@mrdomino
Copy link
Contributor Author

mrdomino commented Nov 27, 2025

The test case that’s failing looks to be add_mod_special_10 (which comes lexicographically after add_mod_special_1, which succeeds.)

Far as I can tell, ct_lt is somehow broken on aarch64 linux (at least under docker) in particular:
https://github.com/mrdomino/crypto-bigint/actions/runs/19723784093/job/56511165646?pr=2#step:6:192

p=FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF98F6B315735BF24F
iter 0
mod=[Limb(0x98F6B315735BF24F), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF)]
limbs=[Limb(0x98B82B0336070665), Limb(0x3825A7DC63080D42), Limb(0x489FF253D51BCBE5), Limb(0xB5EA8D36E9E22058), Limb(0x7609DFE07EFF8A17), Limb(0x27143F42B5C5488E), Limb(0xDFF1A71B0AB16013), Limb(0xB4E550B02C806908), Limb(0x7BB6BDB2070C4371), Limb(0x93C3FA5EF11049E3)]
limbs=[Limb(0x19762DE5D5DB8E23), Limb(0x43901A0BF9E9E1A1), Limb(0x7415EFBF72A6FBE2), Limb(0x47BD11DA7391671D), Limb(0xFDB44C31096437DD), Limb(0xE99017B2B621320B), Limb(0xF2764A7CC915F079), Limb(0x221C45E641FE43DD), Limb(0x57FBAE920FC0902B), Limb(0x7AF5168C347039E6)]

That first limbs= sure looks like it’s less than mod, but somehow the loop continues to the next limbs=, and then the one after that, forever.

https://github.com/mrdomino/crypto-bigint/pull/2/files#diff-c2664e1b17aff01fe7094ea120695aac16f338874bb6f47c225c71e022fab578R131-R141

@mrdomino
Copy link
Contributor Author

mrdomino commented Nov 27, 2025

I get the same behavior locally under aarch64-linux in an Alpine VM via UTM on my Mac. Only under --release; debug builds do not behave this way, and instead succeed, taking the first generated number like you’d expect:

iter 0
mod=[Limb(0x98F6B315735BF24F), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF)]
limbs=[Limb(0x98B82B0336070665), Limb(0x3825A7DC63080D42), Limb(0x489FF253D51BCBE5), Limb(0xB5EA8D36E9E22058), Limb(0x7609DFE07EFF8A17), Limb(0x27143F42B5C5488E), Limb(0xDFF1A71B0AB16013), Limb(0xB4E550B02C806908), Limb(0x7BB6BDB2070C4371), Limb(0x93C3FA5EF11049E3)]
a=93C3FA5EF11049E37BB6BDB2070C4371B4E550B02C806908DFF1A71B0AB1601327143F42B5C5488E7609DFE07EFF8A17B5EA8D36E9E22058489FF253D51BCBE53825A7DC63080D4298B82B0336070665
mod=[Limb(0x98F6B315735BF24F), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF)]
limbs=[Limb(0x19762DE5D5DB8E23), Limb(0x43901A0BF9E9E1A1), Limb(0x7415EFBF72A6FBE2), Limb(0x47BD11DA7391671D), Limb(0xFDB44C31096437DD), Limb(0xE99017B2B621320B), Limb(0xF2764A7CC915F079), Limb(0x221C45E641FE43DD), Limb(0x57FBAE920FC0902B), Limb(0x7AF5168C347039E6)]
b=7AF5168C347039E657FBAE920FC0902B221C45E641FE43DDF2764A7CC915F079E99017B2B621320BFDB44C31096437DD47BD11DA7391671D7415EFBF72A6FBE243901A0BF9E9E1A119762DE5D5DB8E23
iter 0 done

@mrdomino
Copy link
Contributor Author

mrdomino commented Nov 27, 2025

Ah, add_mod_special_{2,3,4} all succeed. add_mod_special_5 onwards fails. This is add_mod_special_5 in --release:

iter 0
mod=[Limb(0x98F6B315735BF24F), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF)]
limbs=[Limb(0x98B82B0336070665), Limb(0x3825A7DC63080D42), Limb(0x489FF253D51BCBE5), Limb(0xB5EA8D36E9E22058), Limb(0x7609DFE07EFF8A17)]
limbs=[Limb(0x27143F42B5C5488E), Limb(0xDFF1A71B0AB16013), Limb(0xB4E550B02C806908), Limb(0x7BB6BDB2070C4371), Limb(0x93C3FA5EF11049E3)]

This is without --release:

iter 0
mod=[Limb(0x98F6B315735BF24F), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF)]
limbs=[Limb(0x98B82B0336070665), Limb(0x3825A7DC63080D42), Limb(0x489FF253D51BCBE5), Limb(0xB5EA8D36E9E22058), Limb(0x7609DFE07EFF8A17)]
a=7609DFE07EFF8A17B5EA8D36E9E22058489FF253D51BCBE53825A7DC63080D4298B82B0336070665
mod=[Limb(0x98F6B315735BF24F), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF), Limb(0xFFFFFFFFFFFFFFFF)]
limbs=[Limb(0x27143F42B5C5488E), Limb(0xDFF1A71B0AB16013), Limb(0xB4E550B02C806908), Limb(0x7BB6BDB2070C4371), Limb(0x93C3FA5EF11049E3)]
b=93C3FA5EF11049E37BB6BDB2070C4371B4E550B02C806908DFF1A71B0AB1601327143F42B5C5488E
iter 0 done

@mrdomino
Copy link
Contributor Author

So to recap: ct_lt looks broken under --release on Linux on aarch64, when the Uint has 5 or more limbs. It works on non-Linux aarch64 or non-aarch64 Linux or non---release or on Uints with 4 or fewer limbs.

Testing a theory that this may get linux-aarch64 to decide to do the
comparison.
@mrdomino

This comment was marked as outdated.

@mrdomino

This comment was marked as outdated.

@mrdomino

This comment was marked as outdated.

@mrdomino
Copy link
Contributor Author

Hoookay, pretty sure we have a rustc/LLVM bug on our hands. Following is recorded for posterity and/or for the LLVM bug report.

I took a different approach: starting from master and working my way down, this does not hang:

diff --git a/src/uint/rand.rs b/src/uint/rand.rs
index d36dc26..5997b3a 100644
--- a/src/uint/rand.rs
+++ b/src/uint/rand.rs
@@ -137,14 +137,10 @@ where

     let hi_word_modulus = modulus.as_ref().as_ref()[n_limbs - 1].0;
     let mask = !0 >> hi_word_modulus.leading_zeros();
-    let mut hi_word = next_word()? & mask;

     loop {
-        while hi_word > hi_word_modulus {
-            hi_word = next_word()? & mask;
-        }
         // Set high limb
-        n.as_mut()[n_limbs - 1] = Limb::from_le_bytes(hi_word.to_le_bytes());
+        n.as_mut()[n_limbs - 1] = Limb::from_le_bytes((next_word()? & mask).to_le_bytes());
         // Set low limbs
         for i in 0..n_limbs - 1 {
             // Need to deserialize from little-endian to make sure that two 32-bit limbs
@@ -157,7 +153,6 @@ where
         if n.ct_lt(modulus).into() {
             break;
         }
-        hi_word = next_word()? & mask;
     }
     Ok(())
 }

This (diff from the prior diff) does:

diff --git a/src/uint/rand.rs b/src/uint/rand.rs
index 5997b3a..1503267 100644
--- a/src/uint/rand.rs
+++ b/src/uint/rand.rs
@@ -139,8 +139,6 @@ where
     let mask = !0 >> hi_word_modulus.leading_zeros();

     loop {
-        // Set high limb
-        n.as_mut()[n_limbs - 1] = Limb::from_le_bytes((next_word()? & mask).to_le_bytes());
         // Set low limbs
         for i in 0..n_limbs - 1 {
             // Need to deserialize from little-endian to make sure that two 32-bit limbs
@@ -148,6 +146,8 @@ where
             // byte stream.
             n.as_mut()[i] = Limb::from_le_bytes(next_word()?.to_le_bytes());
         }
+        // Set high limb
+        n.as_mut()[n_limbs - 1] = Limb::from_le_bytes((next_word()? & mask).to_le_bytes());
         // If the high limb is equal to the modulus' high limb, it's still possible
         // that the full uint is too big so we check and repeat if it is.
         if n.ct_lt(modulus).into() {

If I dump the assembly for a monomorphized random_mod_core at Uint<5>:

diff --git a/src/uint.rs b/src/uint.rs
index a5276cd..1478107 100644
--- a/src/uint.rs
+++ b/src/uint.rs
@@ -2,8 +2,9 @@

 #![allow(clippy::needless_range_loop, clippy::many_single_char_names)]

-use core::fmt;
+use core::{convert::Infallible, fmt};

+use chacha20::ChaCha20Rng;
 #[cfg(feature = "serde")]
 use serdect::serde::{Deserialize, Deserializer, Serialize, Serializer};
 use subtle::{Choice, ConditionallySelectable, ConstantTimeEq};
@@ -61,6 +62,12 @@ pub(crate) mod boxed;
 #[cfg(feature = "rand_core")]
 mod rand;

+/// my func
+#[inline(never)]
+pub fn my_random_mod(rng: &mut ChaCha20Rng, n: &mut Uint<5>, modulus: &NonZero<Uint<5>>, n_bits: u32) -> Result<(), Infallible> {
+    return rand::random_mod_core(rng, n, modulus, n_bits)
+}
+
 /// Stack-allocated big unsigned integer.
 ///
 /// Generic over the given number of `LIMBS`

There is a pretty substantial diff:

--- good_5.s	2025-11-26 21:40:49
+++ bad_5.s	2025-11-26 21:40:45
@@ -31,7 +31,7 @@
 	cinc w8, w8, ne
 	sub x20, x8, #1
 	cmp x20, #4
-	b.hi .LBB2_9
+	b.hi .LBB2_11
 	ldr x8, [x2, x20, lsl #3]
 	mov x9, #-1
 	ldr x27, [x2, #32]
@@ -41,21 +41,27 @@
 	ldp x25, x26, [x2, #16]
 	lsr x24, x9, x8
 	mov x21, x0
-	cbz x20, .LBB2_7
-.LBB2_2:
-	mov x0, x21
-	bl <R as rand_core::TryRngCore>::try_next_u64
+	cbz x20, .LBB2_8
 	mov x28, xzr
-	and x8, x0, x24
-	str x8, [x19, x20, lsl #3]
+	b .LBB2_4
 .LBB2_3:
+	and w0, w8, #0x1
+	bl subtle::black_box
+	tst w0, #0xff
+	mov x28, xzr
+	b.ne .LBB2_10
+.LBB2_4:
 	mov x0, x21
 	bl <R as rand_core::TryRngCore>::try_next_u64
 	add x8, x28, #1
 	str x0, [x19, x28, lsl #3]
-	cmp x20, x8
+	cmp x8, x20
 	mov x28, x8
-	b.ne .LBB2_3
+	b.ne .LBB2_4
+	mov x0, x21
+	bl <R as rand_core::TryRngCore>::try_next_u64
+	and x8, x0, x24
+	str x8, [x19, x20, lsl #3]
 	ldp x8, x9, [x19]
 	cmp x8, x22
 	cset w8, lo
@@ -69,50 +75,24 @@
 	ngc x9, xzr
 	cmp x8, x25
 	sbcs xzr, x9, xzr
+	ldr x9, [x19, #32]
 	cset w8, lt
-	subs x8, x10, x8
-	ldr x10, [x19, #32]
-	ngc x9, xzr
-	cmp x8, x26
-	sbcs xzr, x9, xzr
-	cset w8, lt
-	subs x9, x10, x27
-	ngc x10, xzr
-	cmp x9, x8
-	sbc x8, x10, xzr
+	subs x10, x10, x8
+	ngc x11, xzr
+	subs x9, x9, x27
+	ngc x8, xzr
+	cmp x10, x26
+	sbcs xzr, x11, xzr
+	b.ge .LBB2_3
+	cmp x9, #1
+	cinc x8, x8, hs
+	b .LBB2_3
+.LBB2_7:
 	and w0, w8, #0x1
 	bl subtle::black_box
 	tst w0, #0xff
-	b.eq .LBB2_2
-.LBB2_5:
-	.cfi_def_cfa wsp, 96
-	ldp x20, x19, [sp, #80]
-	ldp x22, x21, [sp, #64]
-	ldp x24, x23, [sp, #48]
-	ldp x26, x25, [sp, #32]
-	ldp x28, x27, [sp, #16]
-	ldp x29, x30, [sp], #96
-	.cfi_def_cfa_offset 0
-	.cfi_restore w19
-	.cfi_restore w20
-	.cfi_restore w21
-	.cfi_restore w22
-	.cfi_restore w23
-	.cfi_restore w24
-	.cfi_restore w25
-	.cfi_restore w26
-	.cfi_restore w27
-	.cfi_restore w28
-	.cfi_restore w30
-	.cfi_restore w29
-	ret
-.LBB2_6:
-	.cfi_restore_state
-	and w0, w8, #0x1
-	bl subtle::black_box
-	tst w0, #0xff
-	b.ne .LBB2_5
-.LBB2_7:
+	b.ne .LBB2_10
+.LBB2_8:
 	mov x0, x21
 	bl <R as rand_core::TryRngCore>::try_next_u64
 	and x8, x0, x24
@@ -138,11 +118,34 @@
 	ngc x8, xzr
 	cmp x10, x26
 	sbcs xzr, x11, xzr
-	b.ge .LBB2_6
+	b.ge .LBB2_7
 	cmp x9, #1
 	cinc x8, x8, hs
-	b .LBB2_6
-.LBB2_9:
+	b .LBB2_7
+.LBB2_10:
+	.cfi_def_cfa wsp, 96
+	ldp x20, x19, [sp, #80]
+	ldp x22, x21, [sp, #64]
+	ldp x24, x23, [sp, #48]
+	ldp x26, x25, [sp, #32]
+	ldp x28, x27, [sp, #16]
+	ldp x29, x30, [sp], #96
+	.cfi_def_cfa_offset 0
+	.cfi_restore w19
+	.cfi_restore w20
+	.cfi_restore w21
+	.cfi_restore w22
+	.cfi_restore w23
+	.cfi_restore w24
+	.cfi_restore w25
+	.cfi_restore w26
+	.cfi_restore w27
+	.cfi_restore w28
+	.cfi_restore w30
+	.cfi_restore w29
+	ret
+.LBB2_11:
+	.cfi_restore_state
 	adrp x2, .Lanon.2cba860f59aa28f6ecc8a20933206b3b.3
 	add x2, x2, :lo12:.Lanon.2cba860f59aa28f6ecc8a20933206b3b.3
 	mov x0, x20

Compared against a much smaller diff if monomorphized to Uint<4>:

--- good_4.s	2025-11-26 21:48:27
+++ not_bad_4.s	2025-11-26 21:48:34
@@ -40,20 +40,19 @@
 	ldp x25, x26, [x2, #16]
 	lsr x22, x9, x8
 	cbz x20, .LBB2_5
-.LBB2_2:
-	mov x0, x21
-	bl <R as rand_core::TryRngCore>::try_next_u64
 	mov x27, xzr
-	and x8, x0, x22
-	str x8, [x19, x20, lsl #3]
 .LBB2_3:
 	mov x0, x21
 	bl <R as rand_core::TryRngCore>::try_next_u64
 	add x8, x27, #1
 	str x0, [x19, x27, lsl #3]
-	cmp x20, x8
+	cmp x8, x20
 	mov x27, x8
 	b.ne .LBB2_3
+	mov x0, x21
+	bl <R as rand_core::TryRngCore>::try_next_u64
+	and x8, x0, x22
+	str x8, [x19, x20, lsl #3]
 	ldp x8, x9, [x19]
 	cmp x8, x23
 	cset w8, lo
@@ -75,7 +74,8 @@
 	and w0, w8, #0x1
 	bl subtle::black_box
 	tst w0, #0xff
-	b.eq .LBB2_2
+	mov x27, xzr
+	b.eq .LBB2_3
 	b .LBB2_6
 .LBB2_5:
 	mov x0, x21

The full bad_5.s (with the hi_word set right before the comparison) is:

.section .text.crypto_bigint::uint::my_random_mod,"ax",@progbits
	.globl	crypto_bigint::uint::my_random_mod
	.p2align	2
.type	crypto_bigint::uint::my_random_mod,@function
crypto_bigint::uint::my_random_mod:
	.cfi_startproc
	stp x29, x30, [sp, #-96]!
	.cfi_def_cfa_offset 96
	stp x28, x27, [sp, #16]
	stp x26, x25, [sp, #32]
	stp x24, x23, [sp, #48]
	stp x22, x21, [sp, #64]
	stp x20, x19, [sp, #80]
	mov x29, sp
	.cfi_def_cfa w29, 96
	.cfi_offset w19, -8
	.cfi_offset w20, -16
	.cfi_offset w21, -24
	.cfi_offset w22, -32
	.cfi_offset w23, -40
	.cfi_offset w24, -48
	.cfi_offset w25, -56
	.cfi_offset w26, -64
	.cfi_offset w27, -72
	.cfi_offset w28, -80
	.cfi_offset w30, -88
	.cfi_offset w29, -96
	.cfi_remember_state
	lsr w8, w3, #6
	tst w3, #0x3f
	cinc w8, w8, ne
	sub x20, x8, #1
	cmp x20, #4
	b.hi .LBB2_11
	ldr x8, [x2, x20, lsl #3]
	mov x9, #-1
	ldr x27, [x2, #32]
	ldp x22, x23, [x2]
	mov x19, x1
	clz x8, x8
	ldp x25, x26, [x2, #16]
	lsr x24, x9, x8
	mov x21, x0
	cbz x20, .LBB2_8
	mov x28, xzr
	b .LBB2_4
.LBB2_3:
	and w0, w8, #0x1
	bl subtle::black_box
	tst w0, #0xff
	mov x28, xzr
	b.ne .LBB2_10
.LBB2_4:
	mov x0, x21
	bl <R as rand_core::TryRngCore>::try_next_u64
	add x8, x28, #1
	str x0, [x19, x28, lsl #3]
	cmp x8, x20
	mov x28, x8
	b.ne .LBB2_4
	mov x0, x21
	bl <R as rand_core::TryRngCore>::try_next_u64
	and x8, x0, x24
	str x8, [x19, x20, lsl #3]
	ldp x8, x9, [x19]
	cmp x8, x22
	cset w8, lo
	subs x8, x9, x8
	ngc x9, xzr
	cmp x8, x23
	ldp x8, x10, [x19, #16]
	sbcs xzr, x9, xzr
	cset w9, lt
	subs x8, x8, x9
	ngc x9, xzr
	cmp x8, x25
	sbcs xzr, x9, xzr
	ldr x9, [x19, #32]
	cset w8, lt
	subs x10, x10, x8
	ngc x11, xzr
	subs x9, x9, x27
	ngc x8, xzr
	cmp x10, x26
	sbcs xzr, x11, xzr
	b.ge .LBB2_3
	cmp x9, #1
	cinc x8, x8, hs
	b .LBB2_3
.LBB2_7:
	and w0, w8, #0x1
	bl subtle::black_box
	tst w0, #0xff
	b.ne .LBB2_10
.LBB2_8:
	mov x0, x21
	bl <R as rand_core::TryRngCore>::try_next_u64
	and x8, x0, x24
	str x8, [x19, x20, lsl #3]
	ldp x8, x9, [x19]
	cmp x8, x22
	cset w8, lo
	subs x8, x9, x8
	ngc x9, xzr
	cmp x8, x23
	ldp x8, x10, [x19, #16]
	sbcs xzr, x9, xzr
	cset w9, lt
	subs x8, x8, x9
	ngc x9, xzr
	cmp x8, x25
	sbcs xzr, x9, xzr
	ldr x9, [x19, #32]
	cset w8, lt
	subs x10, x10, x8
	ngc x11, xzr
	subs x9, x9, x27
	ngc x8, xzr
	cmp x10, x26
	sbcs xzr, x11, xzr
	b.ge .LBB2_7
	cmp x9, #1
	cinc x8, x8, hs
	b .LBB2_7
.LBB2_10:
	.cfi_def_cfa wsp, 96
	ldp x20, x19, [sp, #80]
	ldp x22, x21, [sp, #64]
	ldp x24, x23, [sp, #48]
	ldp x26, x25, [sp, #32]
	ldp x28, x27, [sp, #16]
	ldp x29, x30, [sp], #96
	.cfi_def_cfa_offset 0
	.cfi_restore w19
	.cfi_restore w20
	.cfi_restore w21
	.cfi_restore w22
	.cfi_restore w23
	.cfi_restore w24
	.cfi_restore w25
	.cfi_restore w26
	.cfi_restore w27
	.cfi_restore w28
	.cfi_restore w30
	.cfi_restore w29
	ret
.LBB2_11:
	.cfi_restore_state
	adrp x2, .Lanon.2cba860f59aa28f6ecc8a20933206b3b.3
	add x2, x2, :lo12:.Lanon.2cba860f59aa28f6ecc8a20933206b3b.3
	mov x0, x20
	mov w1, #5
	bl core::panicking::panic_bounds_check

And here is not_bad_4.s:

.section .text.crypto_bigint::uint::my_random_mod,"ax",@progbits
	.globl	crypto_bigint::uint::my_random_mod
	.p2align	2
.type	crypto_bigint::uint::my_random_mod,@function
crypto_bigint::uint::my_random_mod:
	.cfi_startproc
	stp x29, x30, [sp, #-96]!
	.cfi_def_cfa_offset 96
	str x27, [sp, #16]
	stp x26, x25, [sp, #32]
	stp x24, x23, [sp, #48]
	stp x22, x21, [sp, #64]
	stp x20, x19, [sp, #80]
	mov x29, sp
	.cfi_def_cfa w29, 96
	.cfi_offset w19, -8
	.cfi_offset w20, -16
	.cfi_offset w21, -24
	.cfi_offset w22, -32
	.cfi_offset w23, -40
	.cfi_offset w24, -48
	.cfi_offset w25, -56
	.cfi_offset w26, -64
	.cfi_offset w27, -80
	.cfi_offset w30, -88
	.cfi_offset w29, -96
	.cfi_remember_state
	lsr w8, w3, #6
	tst w3, #0x3f
	cinc w8, w8, ne
	sub x20, x8, #1
	cmp x20, #3
	b.hi .LBB2_7
	ldr x8, [x2, x20, lsl #3]
	mov x9, #-1
	mov x19, x1
	ldp x23, x24, [x2]
	mov x21, x0
	clz x8, x8
	ldp x25, x26, [x2, #16]
	lsr x22, x9, x8
	cbz x20, .LBB2_5
	mov x27, xzr
.LBB2_3:
	mov x0, x21
	bl <R as rand_core::TryRngCore>::try_next_u64
	add x8, x27, #1
	str x0, [x19, x27, lsl #3]
	cmp x8, x20
	mov x27, x8
	b.ne .LBB2_3
	mov x0, x21
	bl <R as rand_core::TryRngCore>::try_next_u64
	and x8, x0, x22
	str x8, [x19, x20, lsl #3]
	ldp x8, x9, [x19]
	cmp x8, x23
	cset w8, lo
	subs x8, x9, x8
	ngc x9, xzr
	cmp x8, x24
	ldp x8, x10, [x19, #16]
	sbcs xzr, x9, xzr
	cset w9, lt
	subs x8, x8, x9
	ngc x9, xzr
	cmp x8, x25
	sbcs xzr, x9, xzr
	cset w8, lt
	subs x9, x10, x26
	ngc x10, xzr
	cmp x9, x8
	sbc x8, x10, xzr
	and w0, w8, #0x1
	bl subtle::black_box
	tst w0, #0xff
	mov x27, xzr
	b.eq .LBB2_3
	b .LBB2_6
.LBB2_5:
	mov x0, x21
	bl <R as rand_core::TryRngCore>::try_next_u64
	and x8, x0, x22
	str x8, [x19, x20, lsl #3]
	ldp x8, x9, [x19]
	cmp x8, x23
	cset w8, lo
	subs x8, x9, x8
	ngc x9, xzr
	cmp x8, x24
	ldp x8, x10, [x19, #16]
	sbcs xzr, x9, xzr
	cset w9, lt
	subs x8, x8, x9
	ngc x9, xzr
	cmp x8, x25
	sbcs xzr, x9, xzr
	cset w8, lt
	subs x9, x10, x26
	ngc x10, xzr
	cmp x9, x8
	sbc x8, x10, xzr
	and w0, w8, #0x1
	bl subtle::black_box
	tst w0, #0xff
	b.eq .LBB2_5
.LBB2_6:
	.cfi_def_cfa wsp, 96
	ldp x20, x19, [sp, #80]
	ldr x27, [sp, #16]
	ldp x22, x21, [sp, #64]
	ldp x24, x23, [sp, #48]
	ldp x26, x25, [sp, #32]
	ldp x29, x30, [sp], #96
	.cfi_def_cfa_offset 0
	.cfi_restore w19
	.cfi_restore w20
	.cfi_restore w21
	.cfi_restore w22
	.cfi_restore w23
	.cfi_restore w24
	.cfi_restore w25
	.cfi_restore w26
	.cfi_restore w27
	.cfi_restore w30
	.cfi_restore w29
	ret
.LBB2_7:
	.cfi_restore_state
	adrp x2, .Lanon.c0613eab67b71300c185314e527e149f.3
	add x2, x2, :lo12:.Lanon.c0613eab67b71300c185314e527e149f.3
	mov x0, x20
	mov w1, #4
	bl core::panicking::panic_bounds_check

mrdomino added a commit to mrdomino/crypto-bigint that referenced this pull request Nov 27, 2025
This reverts 5784b13, reapplying 62b90b8. See also:
RustCrypto#1018 (comment)

It turns out that the previous approach ran afoul of a likely LLVM (or
possibly rustc) bug.

This time, I tried removing `random_mod_core` entirely and instead
writing `random_mod` and `try_random_mod` as rejection loops over
`random_bits` and `try_random_bits`. Somehow, this does not (seem to)
run afoul of the bug.
@mrdomino mrdomino closed this Nov 28, 2025
@dvdplm
Copy link
Contributor

dvdplm commented Dec 1, 2025

@mrdomino Impressive sleuthing here, kudos.

tarcieri pushed a commit that referenced this pull request Dec 10, 2025
## Breaking changes

- `Encoding::Repr` is no longer required to implement `Copy`, so
consumers of `Encoding::Repr` will need to explicitly call `clone`.

- The numbers produced by both `random_bits` and `random_mod` will
generally be different, and calling these functions will leave the RNG
in a different state, than before.

## Fixes

- Adds a mitigation for rust-lang/rust#149522, modifying
`Word::borrowing_sub` to use `overflowing_sub` instead of `WideWord`
casts.

## Summary

This essentially applies
#285 to `random_bits` as
well as `random_mod`. Both functions behave the same way now, with the
only difference being that `random_mod` adds rejection sampling;
otherwise both will produce the same numbers over the same entropy
stream.

Questions of platform dependence are now easy; we do not define these
algorithms in terms of sequences of machine words but of bytes. Randomly
sampled `Uint`s are now always constructed little-endian over the
entropy stream. This does not preclude future machine-specific
optimizations, but given how perilous the landscape has been (e.g.
#1018), I’ve elected to err in the direction of parsimony rather than
performance for this change.

This leverages the existing work making `Uint` implement `Encoding`. It
additionally needs `Encoding` on `BoxedUint` to make `RandomMod` and
`RandomBits` work there; this is implemented, but requires dropping the
`Copy` constraint on `Encoding::Repr`.

Fixes #1009

[0]: rust-lang/rust#149522
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants