You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
common: Fix false sharing in thread_rwlock_*() wrapper
I only fully grokked false sharing fairly recently, and I now have
access to better hardware, which also made this issue more prominent.
I wrote the initial *nix + Windows rwlock wrapper last year when I was
working on making sure scene updates work while the renderer is running
in interactive mode (10d965d).
Moving the bvh_lock rwlock out of the shared scene struct into a
separate heap allocation has a significant effect on performance on a
32c/64t AMD Epyc 9374F:
Before:
vkoskiv@Triton:~/c-ray$ hyperfine 'bin/c-ray input/hdr.json -s 128 --no-sdl -j 64'
Benchmark 1: bin/c-ray input/hdr.json -s 128 --no-sdl -j 64
Time (mean ± σ): 16.680 s ± 0.036 s [User: 972.341 s, System: 0.199 s]
Range (min … max): 16.621 s … 16.751 s 10 runs
After:
vkoskiv@Triton:~/c-ray$ hyperfine 'bin/c-ray input/hdr.json -s 128 --no-sdl -j 64'
Benchmark 1: bin/c-ray input/hdr.json -s 128 --no-sdl -j 64
Time (mean ± σ): 9.278 s ± 0.028 s [User: 516.252 s, System: 0.189 s]
Range (min … max): 9.244 s … 9.336 s 10 runs
I didn't notice this issue when I tested it last year, since the
performance degradation is much less pronounced on my 4c/8t CPU at home:
Before:
> hyperfine 'bin/c-ray input/hdr.json --no-sdl -s 32'
Benchmark 1: bin/c-ray input/hdr.json --no-sdl -s 32
Time (mean ± σ): 7.363 s ± 0.074 s [User: 49.317 s, System: 0.146 s]
Range (min … max): 7.271 s … 7.515 s 10 runs
After:
> hyperfine 'bin/c-ray input/hdr.json --no-sdl -s 32'
Benchmark 1: bin/c-ray input/hdr.json --no-sdl -s 32
Time (mean ± σ): 7.359 s ± 0.126 s [User: 49.181 s, System: 0.130 s]
Range (min … max): 7.220 s … 7.598 s 10 runs
0 commit comments