Skip to content

Commit 9bd0c7d

Browse files
authored
Simplify the looping structure of bitmap scanning (#2952)
This PR aims to improve the scanning of the marking bitmap. When the lowest bit in the current word (64-bits) is unset, we now swallow all the trailing zeros, updating the bit position and the current word, and return the former. This amounts to unrolling the inner loop once, because we know that we'll encounter another bit in the current word. Unrolling comes at the cost of a few more instructions in the else leg, but since we `return`, we can eliminate the inner `loop` altogether, which is a win especially for sparse bitmaps. ## Benchmarks This optimisation was hinted at in #2927, but not implemented there due to the lack of benchmarking data. Now the `cancan` profile creation benchmark is available, and the GC-relevant cycle-count improvement is ``` shell [nix-shell:~/motoko]$ ghc -e "100-28402346/29159337*100" 2.5960501090954153 ``` about 2.5% compared to the baseline 0.6.16 release. More benchmark data and a graph is added to #2952. ## Implementation concerns We have to use two shifts (once dynamic, and once static counts) because adding one to the dynamic count could result in a shift of 64 bits which is undefined behaviour, and Rust traps on that. We could also refrain from testing the lowest bit and go for the `ctz` directly, but that could result in worse generated code by `wasmtime` (?). OTOH that would probably eliminate the `if`, and branchless code is good! N.B.: For the branchless optimisation the benchmarks look promising but I prefer to merge this first.
1 parent fe09dea commit 9bd0c7d

File tree

2 files changed

+13
-4
lines changed

2 files changed

+13
-4
lines changed

rts/motoko-rts-tests/src/gc/heap.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -206,7 +206,7 @@ impl MotokoHeapInner {
206206

207207
// The Worst-case unalignment w.r.t. 32-byte alignment is 28 (assuming
208208
// that we have general word alignment). So we over-allocate 28 bytes.
209-
let mut heap: Vec<u8> = vec![0; heap_size + 28];
209+
let mut heap = vec![0u8; heap_size + 28];
210210

211211
// MarkCompact assumes that the dynamic heap starts at a 32-byte multiple
212212
let realign = match gc {

rts/motoko-rts/src/gc/mark_compact/bitmap.rs

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -167,8 +167,8 @@ impl BitmapIter {
167167

168168
// Outer loop iterates 64-bit words
169169
loop {
170-
// Inner loop iterates bits in the current word
171-
while self.current_word != 0 {
170+
// Inner conditional examines the least significant bit(s) in the current word
171+
if self.current_word != 0 {
172172
if self.current_word & 0b1 != 0 {
173173
let bit_idx = self.current_bit_idx;
174174
self.current_word >>= 1;
@@ -177,12 +177,21 @@ impl BitmapIter {
177177
} else {
178178
let shift_amt = self.current_word.trailing_zeros();
179179
self.current_word >>= shift_amt;
180-
self.current_bit_idx += shift_amt;
180+
self.current_word >>= 1;
181+
let bit_idx = self.current_bit_idx + shift_amt;
182+
self.current_bit_idx = bit_idx + 1;
183+
return bit_idx;
181184
}
182185
}
183186

184187
// Move on to next word (always 64-bit boundary)
185188
self.current_bit_idx += self.leading_zeros;
189+
unsafe {
190+
debug_assert_eq!(
191+
(self.current_bit_idx - get_bitmap_forbidden_size() as u32 * 8) % 64,
192+
0
193+
)
194+
}
186195
if self.current_bit_idx == self.size {
187196
return BITMAP_ITER_END;
188197
}

0 commit comments

Comments
 (0)