Skip to content

Conversation

jkoritzinsky
Copy link
Member

@jkoritzinsky jkoritzinsky commented Aug 5, 2025

This allows us to share more code with NativeAOT, reduce a decent amount of complexity in the runtime, and fixes a blocking issue for #117788

@jkoritzinsky
Copy link
Member Author

I've improved the uncontended thick-lock case to only 1.5ns slower and I've exhausted all of my ideas.

Method Job Toolchain Mean Error StdDev Ratio RatioSD
Uncontended DefaultJob Default 15.17 ns 0.328 ns 0.291 ns 1.00 0.03
Uncontended Job-QMWUGV CoreRun 16.88 ns 0.155 ns 0.137 ns 1.11 0.02

At this point it's as fast as the uncontended thin-lock case (within measurement error).

Can I get another review pass?

@jkoritzinsky
Copy link
Member Author

@MihuBot benchmark System.Collections.Concurrent

@MihaZupan
Copy link
Member

MihaZupan commented Sep 5, 2025

System.Collections.Concurrent.AddRemoveFromDifferentThreads<Int32>.ConcurrentBag(Size: 2000000) got stuck spinning one core.

It does appear to be disabled for AOT already
https://github.com/dotnet/performance/blob/84d81aab28f1f50b3ac90231411e03e923d94278/src/benchmarks/micro/libraries/System.Collections/Concurrent/AddRemoveFromDifferentThreads.cs#L19-L20

Here's a core dump if that helps: https://1drv.ms/f/c/17a1c1fca6517cd3/EhG6thpcjJFAsd09HR6rxHwBEGOhuj-rKC_j_WKUg5P0rg

@jkotas
Copy link
Member

jkotas commented Sep 5, 2025

System.Collections.Concurrent.AddRemoveFromDifferentThreads.ConcurrentBag(Size: 2000000) got stuck spinning one core.

Sounds like a bug that we are "porting" to coreclr now?

@jkoritzinsky
Copy link
Member Author

I'll take a look at the failure. It's possible it's the same as the one in the PR checks here (which is new as of 2 days ago, so that's fun).

Otherwise, I'd bet that we need to introduce a Thread.Yield call somewhere in System.Threading.Lock where we used to yield in the AwareLock impl.

@jkoritzinsky
Copy link
Member Author

Looking at the linked issue, I think the NativeAOT failure was due to #67805 (linked from #66987). I also think that #73033 may have contributed to fixing it.

I think the failures from this PR's run of the benchmark were due to bugs in the lock-free algorithms I wrote (same cause as the PR failures).

@jkotas
Copy link
Member

jkotas commented Sep 10, 2025

@MihuBot benchmark System.Collections.Concurrent

@MihuBot
Copy link

MihuBot commented Sep 10, 2025

System.Collections.Concurrent.IsEmpty_String_
BenchmarkDotNet v0.14.1-nightly.20250107.205, Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish)
AMD EPYC 9V74, 1 CPU, 8 logical and 4 physical cores
  Job-CXNRMC : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-OOWMKK : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
OutlierMode=Default  PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250ms
MaxIterationCount=20  MemoryRandomization=Default  MinIterationCount=15
WarmupCount=1
Method Toolchain Size Mean Error Ratio Allocated Alloc Ratio
Dictionary Main 0 60.3069 ns 0.3031 ns 1.00 - NA
Dictionary PR 0 67.6643 ns 0.0902 ns 1.12 - NA
Queue Main 0 1.7502 ns 0.0138 ns 1.00 - NA
Queue PR 0 2.1883 ns 0.0106 ns 1.25 - NA
Stack Main 0 0.0005 ns 0.0003 ns ? - ?
Stack PR 0 0.0007 ns 0.0011 ns ? - ?
Bag Main 0 6.4018 ns 0.0270 ns 1.00 - NA
Bag PR 0 7.2359 ns 0.0168 ns 1.13 - NA
Dictionary Main 512 2.9015 ns 0.0024 ns 1.00 - NA
Dictionary PR 512 2.9077 ns 0.0080 ns 1.00 - NA
Queue Main 512 1.3167 ns 0.0343 ns 1.00 - NA
Queue PR 512 1.2773 ns 0.0052 ns 0.97 - NA
Stack Main 512 0.0010 ns 0.0013 ns ? - ?
Stack PR 512 0.0006 ns 0.0004 ns ? - ?
Bag Main 512 5.9004 ns 0.0138 ns 1.00 - NA
Bag PR 512 6.4420 ns 0.0157 ns 1.09 - NA
System.Collections.Concurrent.IsEmpty_Int32_
BenchmarkDotNet v0.14.1-nightly.20250107.205, Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish)
AMD EPYC 9V74, 1 CPU, 8 logical and 4 physical cores
  Job-CXNRMC : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-OOWMKK : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
OutlierMode=Default  PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250ms
MaxIterationCount=20  MemoryRandomization=Default  MinIterationCount=15
WarmupCount=1
Method Toolchain Size Mean Error Ratio Allocated Alloc Ratio
Dictionary Main 0 60.3510 ns 0.4007 ns 1.00 - NA
Dictionary PR 0 66.1922 ns 0.0347 ns 1.10 - NA
Queue Main 0 2.0083 ns 0.0117 ns 1.00 - NA
Queue PR 0 2.2453 ns 0.0133 ns 1.12 - NA
Stack Main 0 0.0017 ns 0.0003 ns 1.03 - NA
Stack PR 0 0.0001 ns 0.0001 ns 0.06 - NA
Bag Main 0 3.6648 ns 0.0194 ns 1.00 - NA
Bag PR 0 3.6604 ns 0.0138 ns 1.00 - NA
Dictionary Main 512 2.9330 ns 0.0019 ns 1.00 - NA
Dictionary PR 512 3.0051 ns 0.0205 ns 1.02 - NA
Queue Main 512 1.2885 ns 0.0058 ns 1.00 - NA
Queue PR 512 1.2807 ns 0.0039 ns 0.99 - NA
Stack Main 512 0.0012 ns 0.0013 ns ? - ?
Stack PR 512 0.0000 ns 0.0001 ns ? - ?
Bag Main 512 3.0776 ns 0.0147 ns 1.00 - NA
Bag PR 512 3.0538 ns 0.0091 ns 0.99 - NA
System.Collections.Concurrent.Count_String_
BenchmarkDotNet v0.14.1-nightly.20250107.205, Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish)
AMD EPYC 9V74, 1 CPU, 8 logical and 4 physical cores
  Job-CXNRMC : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-OOWMKK : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
OutlierMode=Default  PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250ms
MaxIterationCount=20  MemoryRandomization=Default  MinIterationCount=15
WarmupCount=1
Method Toolchain Size Mean Error Ratio Allocated Alloc Ratio
Dictionary Main 512 58.478 ns 0.1171 ns 1.00 - NA
Dictionary PR 512 64.779 ns 0.0591 ns 1.11 - NA
Queue Main 512 2.578 ns 0.0140 ns 1.00 - NA
Queue PR 512 2.563 ns 0.0132 ns 0.99 - NA
Queue_EnqueueCountDequeue Main 512 13.289 ns 0.0679 ns 1.00 - NA
Queue_EnqueueCountDequeue PR 512 13.150 ns 0.0398 ns 0.99 - NA
Stack Main 512 565.748 ns 0.1018 ns 1.00 - NA
Stack PR 512 565.683 ns 0.2087 ns 1.00 - NA
Bag Main 512 17.590 ns 0.0718 ns 1.00 - NA
Bag PR 512 17.101 ns 0.0325 ns 0.97 - NA
System.Collections.Concurrent.Count_Int32_
BenchmarkDotNet v0.14.1-nightly.20250107.205, Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish)
AMD EPYC 9V74, 1 CPU, 8 logical and 4 physical cores
  Job-CXNRMC : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-OOWMKK : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
OutlierMode=Default  PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250ms
MaxIterationCount=20  MemoryRandomization=Default  MinIterationCount=15
WarmupCount=1
Method Toolchain Size Mean Error Ratio Allocated Alloc Ratio
Dictionary Main 512 58.177 ns 0.1263 ns 1.00 - NA
Dictionary PR 512 66.273 ns 0.0329 ns 1.14 - NA
Queue Main 512 2.406 ns 0.0267 ns 1.00 - NA
Queue PR 512 2.362 ns 0.0115 ns 0.98 - NA
Queue_EnqueueCountDequeue Main 512 11.228 ns 0.0710 ns 1.00 - NA
Queue_EnqueueCountDequeue PR 512 11.439 ns 0.0572 ns 1.02 - NA
Stack Main 512 566.394 ns 0.1977 ns 1.00 - NA
Stack PR 512 565.759 ns 0.2387 ns 1.00 - NA
Bag Main 512 15.840 ns 0.0750 ns 1.00 - NA
Bag PR 512 17.014 ns 0.0142 ns 1.07 - NA
System.Collections.Concurrent.AddRemoveFromSameThreads_String_
BenchmarkDotNet v0.14.1-nightly.20250107.205, Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish)
AMD EPYC 9V74, 1 CPU, 8 logical and 4 physical cores
  Job-EKTIQB : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-TCXCDL : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
OutlierMode=Default  PowerPlanMode=00000000-0000-0000-0000-000000000000  InvocationCount=1
IterationTime=250ms  MaxIterationCount=20  MaxWarmupIterationCount=10
MemoryRandomization=Default  MinIterationCount=15  MinWarmupIterationCount=6
UnrollFactor=1  WarmupCount=-1
Method Toolchain Size Mean Error Ratio Allocated Alloc Ratio
ConcurrentBag Main 2000000 234.7 ms 8.60 ms 1.00 1.51 KB 1.00
ConcurrentBag PR 2000000 234.2 ms 10.32 ms 1.00 1.46 KB 0.97
ConcurrentStack Main 2000000 106.6 ms 7.86 ms 1.01 125000.68 KB 1.00
ConcurrentStack PR 2000000 118.6 ms 3.58 ms 1.12 125000.63 KB 1.00
ConcurrentQueue Main 2000000 362.3 ms 13.26 ms 1.00 32.77 KB 1.00
ConcurrentQueue PR 2000000 348.0 ms 12.36 ms 0.96 16.51 KB 0.50
System.Collections.Concurrent.AddRemoveFromSameThreads_Int32_
BenchmarkDotNet v0.14.1-nightly.20250107.205, Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish)
AMD EPYC 9V74, 1 CPU, 8 logical and 4 physical cores
  Job-EKTIQB : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-TCXCDL : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
OutlierMode=Default  PowerPlanMode=00000000-0000-0000-0000-000000000000  InvocationCount=1
IterationTime=250ms  MaxIterationCount=20  MaxWarmupIterationCount=10
MemoryRandomization=Default  MinIterationCount=15  MinWarmupIterationCount=6
UnrollFactor=1  WarmupCount=-1
Method Toolchain Size Mean Error Ratio Allocated Alloc Ratio
ConcurrentBag Main 2000000 162.77 ms 20.699 ms 1.02 1.23 KB 1.00
ConcurrentBag PR 2000000 173.14 ms 11.390 ms 1.09 1.18 KB 0.96
ConcurrentStack Main 2000000 74.98 ms 4.917 ms 1.01 125000.4 KB 1.00
ConcurrentStack PR 2000000 75.81 ms 7.255 ms 1.02 125000.91 KB 1.00
ConcurrentQueue Main 2000000 344.56 ms 9.666 ms 1.00 33.55 KB 1.00
ConcurrentQueue PR 2000000 346.67 ms 6.745 ms 1.01 33.77 KB 1.01
System.Collections.Concurrent.AddRemoveFromDifferentThreads_String_
BenchmarkDotNet v0.14.1-nightly.20250107.205, Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish)
AMD EPYC 9V74, 1 CPU, 8 logical and 4 physical cores
  Job-EKTIQB : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-TCXCDL : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
OutlierMode=Default  PowerPlanMode=00000000-0000-0000-0000-000000000000  InvocationCount=1
IterationTime=250ms  MaxIterationCount=20  MaxWarmupIterationCount=10
MemoryRandomization=Default  MinIterationCount=15  MinWarmupIterationCount=6
UnrollFactor=1  WarmupCount=-1
Method Toolchain Size Mean Error Ratio Allocated Alloc Ratio
ConcurrentBag Main 2000000 173.84 ms 23.256 ms 1.03 32 MB 1.00
ConcurrentBag PR 2000000 186.80 ms 20.239 ms 1.10 32 MB 1.00
ConcurrentStack Main 2000000 65.35 ms 8.207 ms 1.02 61.04 MB 1.00
ConcurrentStack PR 2000000 62.36 ms 10.825 ms 0.97 61.04 MB 1.00
ConcurrentQueue Main 2000000 66.13 ms 11.784 ms 1.06 32 MB 1.00
ConcurrentQueue PR 2000000 46.68 ms 15.384 ms 0.75 8 MB 0.25
System.Collections.Concurrent.AddRemoveFromDifferentThreads_Int32_
BenchmarkDotNet v0.14.1-nightly.20250107.205, Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish)
AMD EPYC 9V74, 1 CPU, 8 logical and 4 physical cores
  Job-EKTIQB : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-TCXCDL : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
OutlierMode=Default  PowerPlanMode=00000000-0000-0000-0000-000000000000  InvocationCount=1
IterationTime=250ms  MaxIterationCount=20  MaxWarmupIterationCount=10
MemoryRandomization=Default  MinIterationCount=15  MinWarmupIterationCount=6
UnrollFactor=1  WarmupCount=-1
Method Toolchain Size Mean Error Ratio Allocated Alloc Ratio
ConcurrentBag Main 2000000 172.54 ms 27.029 ms 1.04 16 MB 1.00
ConcurrentBag PR 2000000 174.39 ms 28.161 ms 1.05 16 MB 1.00
ConcurrentStack Main 2000000 61.98 ms 6.693 ms 1.02 61.04 MB 1.00
ConcurrentStack PR 2000000 53.91 ms 7.852 ms 0.88 61.04 MB 1.00
ConcurrentQueue Main 2000000 37.24 ms 13.043 ms 1.19 4 MB 1.00
ConcurrentQueue PR 2000000 43.85 ms 12.724 ms 1.40 2 MB 0.50

@jkotas
Copy link
Member

jkotas commented Sep 10, 2025

@MihuBot benchmark System.Threading

@MihuBot
Copy link

MihuBot commented Sep 10, 2025

System.Threading.Tests.Perf_Volatile
BenchmarkDotNet v0.14.1-nightly.20250107.205, Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish)
AMD EPYC 9V74, 1 CPU, 8 logical and 4 physical cores
  Job-TLUDJO : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-QKIVUI : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-BOPHZX : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-BLODRC : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250ms  MaxIterationCount=20
MinIterationCount=15  WarmupCount=1
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
Write_double Main 0.0006 ns 0.0003 ns 1.22 - NA
Write_double PR 0.0000 ns 0.0001 ns 0.06 - NA
Read_double Main 0.0000 ns 0.0000 ns ? - ?
Read_double PR 0.0000 ns 0.0000 ns ? - ?
System.Threading.Tests.Perf_Timer
BenchmarkDotNet v0.14.1-nightly.20250107.205, Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish)
AMD EPYC 9V74, 1 CPU, 8 logical and 4 physical cores
  Job-BOPHZX : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-BLODRC : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-TLUDJO : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-QKIVUI : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250ms  MaxIterationCount=20
MinIterationCount=15  WarmupCount=1
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
ShortScheduleAndDispose Main 82.86 ns 0.914 ns 1.00 120 B 1.00
ShortScheduleAndDispose PR 84.02 ns 0.626 ns 1.01 120 B 1.00
LongScheduleAndDispose Main 85.39 ns 0.774 ns 1.00 120 B 1.00
LongScheduleAndDispose PR 83.42 ns 0.884 ns 0.98 120 B 1.00
ScheduleManyThenDisposeMany Main 248,429,539.45 ns 4,893,236.838 ns 1.00 144001192 B 1.00
ScheduleManyThenDisposeMany PR 250,414,633.12 ns 4,806,916.668 ns 1.01 144000840 B 1.00
ShortScheduleAndDisposeWithFiringTimers Main 94.40 ns 3.273 ns 1.00 144 B 1.00
ShortScheduleAndDisposeWithFiringTimers PR 94.41 ns 3.250 ns 1.00 144 B 1.00
SynchronousContention Main 1,419,683,585.36 ns 8,332,486.154 ns 1.00 1152001384 B 1.00
SynchronousContention PR 1,473,812,010.27 ns 15,917,775.117 ns 1.04 1152001744 B 1.00
AsynchronousContention Main 1,121,848,992.35 ns 29,153,217.234 ns 1.00 1152002568 B 1.00
AsynchronousContention PR 1,084,313,660.75 ns 12,161,717.337 ns 0.97 1152002648 B 1.00
System.Threading.Tests.Perf_ThreadStatic
BenchmarkDotNet v0.14.1-nightly.20250107.205, Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish)
AMD EPYC 9V74, 1 CPU, 8 logical and 4 physical cores
  Job-TLUDJO : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-QKIVUI : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
OutlierMode=Default  PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250ms
MaxIterationCount=20  MemoryRandomization=Default  MinIterationCount=15
WarmupCount=1
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
GetThreadStatic Main 2.231 ns 0.1305 ns 1.00 - NA
GetThreadStatic PR 1.347 ns 0.0021 ns 0.61 - NA
SetThreadStatic Main 2.730 ns 0.0020 ns 1.00 - NA
SetThreadStatic PR 2.981 ns 0.0005 ns 1.09 - NA
System.Threading.Tests.Perf_ThreadPool
BenchmarkDotNet v0.14.1-nightly.20250107.205, Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish)
AMD EPYC 9V74, 1 CPU, 8 logical and 4 physical cores
  Job-TLUDJO : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-QKIVUI : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
OutlierMode=Default  PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250ms
MaxIterationCount=20  MemoryRandomization=Default  MinIterationCount=15
WarmupCount=1  Gen0=38000.0000
Method Toolchain WorkItemsPerCore Mean Error Ratio Allocated Alloc Ratio
QueueUserWorkItem_WaitCallback_Throughput Main 20000000 2.164 s 0.0081 s 1.00 610.35 MB 1.00
QueueUserWorkItem_WaitCallback_Throughput PR 20000000 2.177 s 0.0206 s 1.01 610.35 MB 1.00
System.Threading.Tests.Perf_Thread
BenchmarkDotNet v0.14.1-nightly.20250107.205, Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish)
AMD EPYC 9V74, 1 CPU, 8 logical and 4 physical cores
  Job-TLUDJO : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-QKIVUI : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-BOPHZX : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-BLODRC : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250ms  MaxIterationCount=20
MinIterationCount=15  WarmupCount=1
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
CurrentThread Main 1.926 ns 0.0010 ns 1.00 - NA
CurrentThread PR 2.053 ns 0.1157 ns 1.07 - NA
GetCurrentProcessorId Main 1.638 ns 0.0050 ns 1.00 - NA
GetCurrentProcessorId PR 1.637 ns 0.0004 ns 1.00 - NA
System.Threading.Tests.Perf_SpinLock
BenchmarkDotNet v0.14.1-nightly.20250107.205, Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish)
AMD EPYC 9V74, 1 CPU, 8 logical and 4 physical cores
  Job-TLUDJO : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-QKIVUI : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
OutlierMode=Default  PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250ms
MaxIterationCount=20  MemoryRandomization=Default  MinIterationCount=15
WarmupCount=1
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
EnterExit Main 2.9704 ns 0.0009 ns 1.00 - NA
EnterExit PR 2.9755 ns 0.0064 ns 1.00 - NA
TryEnterExit Main 2.9753 ns 0.0029 ns 1.00 - NA
TryEnterExit PR 2.9738 ns 0.0044 ns 1.00 - NA
TryEnter_Fail Main 0.9914 ns 0.0007 ns 1.00 - NA
TryEnter_Fail PR 0.9900 ns 0.0006 ns 1.00 - NA
System.Threading.Tests.Perf_SemaphoreSlim
BenchmarkDotNet v0.14.1-nightly.20250107.205, Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish)
AMD EPYC 9V74, 1 CPU, 8 logical and 4 physical cores
  Job-TLUDJO : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-QKIVUI : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
OutlierMode=Default  PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250ms
MaxIterationCount=20  MemoryRandomization=Default  MinIterationCount=15
WarmupCount=1
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
ReleaseWait Main 21.47 ns 0.022 ns 1.00 - NA
ReleaseWait PR 23.81 ns 0.121 ns 1.11 - NA
ReleaseWaitAsync Main 21.60 ns 0.031 ns 1.00 - NA
ReleaseWaitAsync PR 23.34 ns 0.106 ns 1.08 - NA
ReleaseWaitAsync_WithCancellationToken Main 769.73 ns 7.134 ns 1.00 376 B 1.00
ReleaseWaitAsync_WithCancellationToken PR 735.11 ns 11.638 ns 0.96 376 B 1.00
ReleaseWaitAsync_WithTimeout Main 790.34 ns 14.895 ns 1.00 472 B 1.00
ReleaseWaitAsync_WithTimeout PR 789.72 ns 10.004 ns 1.00 472 B 1.00
ReleaseWaitAsync_WithCancellationTokenAndTimeout Main 847.20 ns 7.681 ns 1.00 472 B 1.00
ReleaseWaitAsync_WithCancellationTokenAndTimeout PR 871.74 ns 13.513 ns 1.03 472 B 1.00
System.Threading.Tests.Perf_Monitor
BenchmarkDotNet v0.14.1-nightly.20250107.205, Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish)
AMD EPYC 9V74, 1 CPU, 8 logical and 4 physical cores
  Job-TLUDJO : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-QKIVUI : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
OutlierMode=Default  PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250ms
MaxIterationCount=20  MemoryRandomization=Default  MinIterationCount=15
WarmupCount=1
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
EnterExit Main 6.799 ns 0.0034 ns 1.00 - NA
EnterExit PR 6.621 ns 0.0081 ns 0.97 - NA
TryEnterExit Main 7.362 ns 0.0375 ns 1.00 - NA
TryEnterExit PR 6.636 ns 0.0161 ns 0.90 - NA
System.Threading.Tests.Perf_Lock
BenchmarkDotNet v0.14.1-nightly.20250107.205, Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish)
AMD EPYC 9V74, 1 CPU, 8 logical and 4 physical cores
  Job-TLUDJO : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-QKIVUI : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
OutlierMode=Default  PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250ms
MaxIterationCount=20  MemoryRandomization=Default  MinIterationCount=15
WarmupCount=1
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
ReaderWriterLockSlimPerf Main 9.731 ns 0.0306 ns 1.00 - NA
ReaderWriterLockSlimPerf PR 10.614 ns 0.0248 ns 1.09 - NA
System.Threading.Tests.Perf_Interlocked
BenchmarkDotNet v0.14.1-nightly.20250107.205, Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish)
AMD EPYC 9V74, 1 CPU, 8 logical and 4 physical cores
  Job-TLUDJO : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-QKIVUI : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
OutlierMode=Default  PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250ms
MaxIterationCount=20  MemoryRandomization=Default  MinIterationCount=15
WarmupCount=1
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
Increment_int Main 0.6323 ns 0.0022 ns 1.00 - NA
Increment_int PR 0.6321 ns 0.0035 ns 1.00 - NA
Decrement_int Main 0.6330 ns 0.0031 ns 1.00 - NA
Decrement_int PR 0.6344 ns 0.0016 ns 1.00 - NA
Increment_long Main 0.6380 ns 0.0025 ns 1.00 - NA
Increment_long PR 0.6380 ns 0.0020 ns 1.00 - NA
Decrement_long Main 0.6318 ns 0.0040 ns 1.00 - NA
Decrement_long PR 0.6342 ns 0.0030 ns 1.00 - NA
Add_int Main 0.6317 ns 0.0021 ns 1.00 - NA
Add_int PR 0.6318 ns 0.0033 ns 1.00 - NA
Add_long Main 0.6315 ns 0.0056 ns 1.00 - NA
Add_long PR 0.6331 ns 0.0028 ns 1.00 - NA
Exchange_int Main 0.7081 ns 0.0005 ns 1.00 - NA
Exchange_int PR 0.7082 ns 0.0009 ns 1.00 - NA
Exchange_long Main 0.7081 ns 0.0005 ns 1.00 - NA
Exchange_long PR 0.7077 ns 0.0003 ns 1.00 - NA
CompareExchange_int Main 0.9255 ns 0.0019 ns 1.00 - NA
CompareExchange_int PR 0.9247 ns 0.0006 ns 1.00 - NA
CompareExchange_long Main 0.9257 ns 0.0009 ns 1.00 - NA
CompareExchange_long PR 0.9247 ns 0.0019 ns 1.00 - NA
CompareExchange_object_Match Main 1.3158 ns 0.1043 ns 1.01 - NA
CompareExchange_object_Match PR 0.9464 ns 0.1192 ns 0.73 - NA
CompareExchange_object_NoMatch Main 1.1155 ns 0.0082 ns 1.00 - NA
CompareExchange_object_NoMatch PR 1.2995 ns 0.1546 ns 1.16 - NA
System.Threading.Tests.Perf_EventWaitHandle
BenchmarkDotNet v0.14.1-nightly.20250107.205, Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish)
AMD EPYC 9V74, 1 CPU, 8 logical and 4 physical cores
  Job-TLUDJO : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-QKIVUI : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
OutlierMode=Default  PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250ms
MaxIterationCount=20  MemoryRandomization=Default  MinIterationCount=15
WarmupCount=1  Median=153.7 ns
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
Set_Reset Main 153.7 ns 0.35 ns 1.00 - NA
Set_Reset PR 153.5 ns 0.41 ns 1.00 - NA
System.Threading.Tests.Perf_CancellationToken
BenchmarkDotNet v0.14.1-nightly.20250107.205, Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish)
AMD EPYC 9V74, 1 CPU, 8 logical and 4 physical cores
  Job-TLUDJO : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-QKIVUI : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-BOPHZX : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-BLODRC : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250ms  MaxIterationCount=20
MinIterationCount=15  WarmupCount=1
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
RegisterAndUnregister_Serial Main 24.722 ns 0.4269 ns 1.00 - NA
RegisterAndUnregister_Serial PR 21.982 ns 0.5715 ns 0.89 - NA
Cancel Main 45.318 ns 0.4535 ns 1.00 192 B 1.00
Cancel PR 46.471 ns 0.3645 ns 1.03 192 B 1.00
CreateLinkedTokenSource1 Main 28.261 ns 0.5563 ns 1.00 64 B 1.00
CreateLinkedTokenSource1 PR 27.548 ns 0.6255 ns 0.98 64 B 1.00
CreateLinkedTokenSource2 Main 44.228 ns 0.8997 ns 1.00 80 B 1.00
CreateLinkedTokenSource2 PR 44.838 ns 0.9604 ns 1.01 80 B 1.00
CreateLinkedTokenSource3 Main 72.391 ns 0.5782 ns 1.00 128 B 1.00
CreateLinkedTokenSource3 PR 72.472 ns 1.0065 ns 1.00 128 B 1.00
CreateTokenDispose Main 6.282 ns 0.0123 ns 1.00 48 B 1.00
CreateTokenDispose PR 5.993 ns 0.0109 ns 0.95 48 B 1.00
CreateRegisterDispose Main 38.807 ns 0.1966 ns 1.00 192 B 1.00
CreateRegisterDispose PR 38.616 ns 0.1736 ns 1.00 192 B 1.00
CreateManyRegisterDispose Main 12.571 ns 0.0575 ns 1.00 - NA
CreateManyRegisterDispose PR 12.797 ns 0.1724 ns 1.02 - NA
CreateManyRegisterMultipleDispose Main 95.185 ns 6.9869 ns 1.01 - NA
CreateManyRegisterMultipleDispose PR 95.458 ns 6.9062 ns 1.01 - NA
CancelAfter Main 59.553 ns 0.6270 ns 1.00 144 B 1.00
CancelAfter PR 57.732 ns 0.8863 ns 0.97 144 B 1.00
System.Threading.Tasks.Tests.Perf_AsyncMethods
BenchmarkDotNet v0.14.1-nightly.20250107.205, Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish)
AMD EPYC 9V74, 1 CPU, 8 logical and 4 physical cores
  Job-TLUDJO : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-QKIVUI : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
OutlierMode=Default  PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250ms
MaxIterationCount=20  MemoryRandomization=Default  MinIterationCount=15
WarmupCount=1
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
EmptyAsyncMethodInvocation Main 5.055 ns 0.0462 ns 1.00 - NA
EmptyAsyncMethodInvocation PR 5.032 ns 0.0071 ns 1.00 - NA
SingleYieldMethodInvocation Main 356.205 ns 4.1456 ns 1.00 96 B 1.00
SingleYieldMethodInvocation PR 354.236 ns 2.2640 ns 0.99 96 B 1.00
Yield Main 176.770 ns 1.1851 ns 1.00 - NA
Yield PR 180.182 ns 1.5086 ns 1.02 - NA
System.Threading.Tasks.ValueTaskPerfTest
BenchmarkDotNet v0.14.1-nightly.20250107.205, Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish)
AMD EPYC 9V74, 1 CPU, 8 logical and 4 physical cores
  Job-NIYAAS : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-HPQHWZ : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-FQHDCZ : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-VNBZDK : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250ms  MaxIterationCount=20
MaxWarmupIterationCount=10  MinIterationCount=15  MinWarmupIterationCount=2
WarmupCount=-1
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
Await_FromResult Main 5.312 ns 0.0101 ns 1.00 - NA
Await_FromResult PR 5.427 ns 0.0080 ns 1.02 - NA
Await_FromCompletedTask Main 11.415 ns 0.0515 ns 1.00 72 B 1.00
Await_FromCompletedTask PR 11.822 ns 0.0382 ns 1.04 72 B 1.00
Await_FromCompletedValueTaskSource Main 18.028 ns 0.0472 ns 1.00 72 B 1.00
Await_FromCompletedValueTaskSource PR 17.061 ns 0.0455 ns 0.95 72 B 1.00
CreateAndAwait_FromResult Main 5.339 ns 0.0066 ns 1.00 - NA
CreateAndAwait_FromResult PR 5.456 ns 0.0076 ns 1.02 - NA
CreateAndAwait_FromResult_ConfigureAwait Main 5.312 ns 0.0070 ns 1.00 - NA
CreateAndAwait_FromResult_ConfigureAwait PR 5.320 ns 0.0288 ns 1.00 - NA
CreateAndAwait_FromCompletedTask Main 7.039 ns 0.0104 ns 1.00 - NA
CreateAndAwait_FromCompletedTask PR 7.264 ns 0.0055 ns 1.03 - NA
CreateAndAwait_FromCompletedTask_ConfigureAwait Main 7.924 ns 0.0175 ns 1.00 - NA
CreateAndAwait_FromCompletedTask_ConfigureAwait PR 8.186 ns 0.0067 ns 1.03 - NA
CreateAndAwait_FromCompletedValueTaskSource Main 8.413 ns 0.0199 ns 1.00 - NA
CreateAndAwait_FromCompletedValueTaskSource PR 8.418 ns 0.0171 ns 1.00 - NA
CreateAndAwait_FromYieldingAsyncMethod Main 539.190 ns 15.0818 ns 1.00 208 B 1.00
CreateAndAwait_FromYieldingAsyncMethod PR 518.297 ns 28.7200 ns 0.96 208 B 1.00
CreateAndAwait_FromDelayedTCS Main 83.099 ns 0.3351 ns 1.00 216 B 1.00
CreateAndAwait_FromDelayedTCS PR 83.483 ns 0.3325 ns 1.00 216 B 1.00
Copy_PassAsArgumentAndReturn_FromResult Main 1.961 ns 0.0021 ns 1.00 - NA
Copy_PassAsArgumentAndReturn_FromResult PR 1.961 ns 0.0024 ns 1.00 - NA
Copy_PassAsArgumentAndReturn_FromTask Main 3.046 ns 0.0037 ns 1.00 - NA
Copy_PassAsArgumentAndReturn_FromTask PR 3.045 ns 0.0044 ns 1.00 - NA
Copy_PassAsArgumentAndReturn_FromValueTaskSource Main 6.780 ns 0.0047 ns 1.00 - NA
Copy_PassAsArgumentAndReturn_FromValueTaskSource PR 6.776 ns 0.0029 ns 1.00 - NA
CreateAndAwait_FromCompletedValueTaskSource_ConfigureAwait Main 11.717 ns 0.0231 ns 1.00 - NA
CreateAndAwait_FromCompletedValueTaskSource_ConfigureAwait PR 11.751 ns 0.0497 ns 1.00 - NA
System.Threading.Channels.Tests.UnboundedChannelPerfTests
BenchmarkDotNet v0.14.1-nightly.20250107.205, Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish)
AMD EPYC 9V74, 1 CPU, 8 logical and 4 physical cores
  Job-TLUDJO : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-QKIVUI : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
OutlierMode=Default  PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250ms
MaxIterationCount=20  MemoryRandomization=Default  MinIterationCount=15
WarmupCount=1
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
TryWriteThenTryRead Main 18.70 ns 0.033 ns 1.00 - NA
TryWriteThenTryRead PR 19.91 ns 0.015 ns 1.07 - NA
WriteAsyncThenReadAsync Main 26.03 ns 0.018 ns 1.00 - NA
WriteAsyncThenReadAsync PR 26.30 ns 0.031 ns 1.01 - NA
ReadAsyncThenWriteAsync Main 42.82 ns 0.090 ns 1.00 - NA
ReadAsyncThenWriteAsync PR 44.77 ns 0.073 ns 1.05 - NA
PingPong Main 7,263,334.83 ns 124,832.259 ns 1.00 1010 B 1.00
PingPong PR 7,341,565.94 ns 138,260.388 ns 1.01 1140 B 1.13
System.Threading.Channels.Tests.SpscUnboundedChannelPerfTests
BenchmarkDotNet v0.14.1-nightly.20250107.205, Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish)
AMD EPYC 9V74, 1 CPU, 8 logical and 4 physical cores
  Job-TLUDJO : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-QKIVUI : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
OutlierMode=Default  PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250ms
MaxIterationCount=20  MemoryRandomization=Default  MinIterationCount=15
WarmupCount=1
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
TryWriteThenTryRead Main 20.29 ns 0.128 ns 1.00 - NA
TryWriteThenTryRead PR 20.68 ns 0.103 ns 1.02 - NA
WriteAsyncThenReadAsync Main 31.46 ns 0.106 ns 1.00 - NA
WriteAsyncThenReadAsync PR 31.48 ns 0.028 ns 1.00 - NA
ReadAsyncThenWriteAsync Main 39.89 ns 0.262 ns 1.00 - NA
ReadAsyncThenWriteAsync PR 42.03 ns 0.038 ns 1.05 - NA
PingPong Main 7,000,869.83 ns 45,956.310 ns 1.00 996 B 1.00
PingPong PR 7,385,030.19 ns 144,106.706 ns 1.05 1140 B 1.14
System.Threading.Channels.Tests.BoundedChannelPerfTests
BenchmarkDotNet v0.14.1-nightly.20250107.205, Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish)
AMD EPYC 9V74, 1 CPU, 8 logical and 4 physical cores
  Job-TLUDJO : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-QKIVUI : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
OutlierMode=Default  PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250ms
MaxIterationCount=20  MemoryRandomization=Default  MinIterationCount=15
WarmupCount=1
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
TryWriteThenTryRead Main 27.73 ns 0.033 ns 1.00 - NA
TryWriteThenTryRead PR 30.12 ns 0.061 ns 1.09 - NA
WriteAsyncThenReadAsync Main 35.47 ns 0.196 ns 1.00 - NA
WriteAsyncThenReadAsync PR 36.46 ns 0.134 ns 1.03 - NA
ReadAsyncThenWriteAsync Main 38.84 ns 0.173 ns 1.00 - NA
ReadAsyncThenWriteAsync PR 45.22 ns 0.032 ns 1.16 - NA
PingPong Main 7,189,249.38 ns 49,509.942 ns 1.00 1014 B 1.00
PingPong PR 7,537,659.60 ns 121,117.126 ns 1.05 1123 B 1.11

@jkoritzinsky jkoritzinsky removed the blocked Issue/PR is blocked on something - see comments label Sep 10, 2025
@jkoritzinsky
Copy link
Member Author

I ran these benchmarks locally and used the perf team's ResultsComparer tooling (with their recommended 2% threshold) and got the following results:

summary:
better: 23, geomean: 1.099
worse: 11, geomean: 1.093
total diff: 34
Slower diff/base Base Median (ns) Diff Median (ns) Modality
System.Threading.Tasks.Tests.Perf_AsyncMethods.Yield 1.43 157.16 224.16
System.Threading.Channels.Tests.UnboundedChannelPerfTests.TryWriteThenTryRead 1.14 29.64 33.64
System.Collections.Concurrent.AddRemoveFromDifferentThreads.ConcurrentSt 1.11 89349750.00 98731550.00 several?
System.Threading.Tasks.ValueTaskPerfTest.CreateAndAwait_FromResult_ConfigureAwai 1.07 5.32 5.70
System.Threading.Tests.Perf_Timer.AsynchronousContention 1.07 4311523100.00 4596094950.00
System.Threading.Tasks.ValueTaskPerfTest.CreateAndAwait_FromCompletedValueTaskSo 1.06 8.60 9.14
System.Threading.Channels.Tests.BoundedChannelPerfTests.ReadAsyncThenWriteAsync 1.06 60.48 63.94 bimodal
System.Threading.Tasks.Tests.Perf_AsyncMethods.SingleYieldMethodInvocation 1.05 496.56 519.55
System.Threading.Tasks.ValueTaskPerfTest.Await_FromCompletedTask 1.04 12.26 12.80
System.Threading.Channels.Tests.SpscUnboundedChannelPerfTests.WriteAsyncThenRead 1.04 31.39 32.50
System.Threading.Channels.Tests.BoundedChannelPerfTests.WriteAsyncThenReadAsync 1.03 46.94 48.23
Faster base/diff Base Median (ns) Diff Median (ns) Modality
System.Collections.Concurrent.IsEmpty.Bag(Size: 512) 1.29 7.57 5.87
System.Collections.Concurrent.IsEmpty.Queue(Size: 0) 1.22 2.31 1.90
System.Collections.Concurrent.IsEmpty.Dictionary(Size: 512) 1.19 3.11 2.61 bimodal
System.Threading.Tests.Perf_ThreadStatic.SetThreadStatic 1.18 2.20 1.86
System.Threading.Tasks.ValueTaskPerfTest.Copy_PassAsArgumentAndReturn_FromResult 1.16 2.36 2.03
System.Collections.Concurrent.IsEmpty.Bag(Size: 0) 1.16 7.17 6.18
System.Threading.Tests.Perf_SemaphoreSlim.ReleaseWait 1.11 35.54 31.97
System.Threading.Tasks.Tests.Perf_AsyncMethods.EmptyAsyncMethodInvocation 1.10 5.17 4.70
System.Threading.Tests.Perf_SemaphoreSlim.ReleaseWaitAsync_WithCancellationToken 1.09 1570.68 1435.42
System.Threading.Tests.Perf_Timer.SynchronousContention 1.08 3671500800.00 3396126000.00
System.Threading.Channels.Tests.BoundedChannelPerfTests.PingPong 1.08 13669746.15 12649100.00
System.Threading.Tests.Perf_Monitor.TryEnterExit 1.08 14.16 13.16
System.Threading.Channels.Tests.SpscUnboundedChannelPerfTests.TryWriteThenTryRea 1.07 21.01 19.56
System.Threading.Tasks.ValueTaskPerfTest.CreateAndAwait_FromResult 1.07 6.04 5.64
System.Threading.Channels.Tests.UnboundedChannelPerfTests.PingPong 1.07 12876008.33 12026150.00
System.Threading.Tasks.ValueTaskPerfTest.CreateAndAwait_FromCompletedValueTaskSo 1.06 14.69 13.84
System.Threading.Tests.Perf_Monitor.EnterExit 1.05 13.85 13.13
System.Collections.Concurrent.Count.Queue_EnqueueCountDequeue(Size: 512) 1.05 17.20 16.32
System.Collections.Concurrent.Count.Bag(Size: 512) 1.05 32.46 30.81
System.Collections.Concurrent.Count.Bag(Size: 512) 1.04 32.17 30.88
System.Threading.Tests.Perf_Timer.ScheduleManyThenDisposeMany 1.04 487751500.00 470460600.00
System.Threading.Channels.Tests.SpscUnboundedChannelPerfTests.ReadAsyncThenWrite 1.03 61.28 59.30
System.Collections.Concurrent.Count.Stack(Size: 512) 1.03 478.25 464.98

The Task and ValueTask ones seem unrelated, and the only ones that are consistently slower are the Channels ones.

I'm not sure if this is due to different core counts, the fact that MihuBot runs on cloud VMs, or the natural instability of multithreading benchmarks.

@jkotas
Copy link
Member

jkotas commented Sep 14, 2025

/azp run runtime-nativeaot-outerloop, runtime-coreclr outerloop, runtime-coreclr gcstress-extra, runtime-coreclr gcstress0x3-gcstress0xc

Copy link

Azure Pipelines successfully started running 4 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-VM-coreclr linkable-framework Issues associated with delivering a linker friendly framework
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants