Skip to content

Conversation

Erarndt
Copy link
Contributor

@Erarndt Erarndt commented Aug 27, 2025

Fixes #

Context

This is a modification of #12113 that uses a SemaphoreSlim without a cap on the number of times it can be released. The previous code checked the count property to see if Release() needed to be called since calling it when there is a maximum specified can throw an exception if we're already at the max.

However, this pattern has a potential race condition if more than one thread tries to enqueue a packet at once. Multiple threads read 0 from CurrentCount and try to call Release() which causes an exception to be thrown. Additionally, the ARM64 architecture has a weaker memory model than x86/64 which can lead to unintended behavior. The .NET runtime maintains its own memory model rules that ensure correctness across platforms. Of note in this case is this section about order of memory operations.

"The effects of ordinary reads and writes can be reordered as long as that preserves single-thread consistency."

Since the runtime only needs to guarantee single-thread consistency, the read of something like CurrentCount could be read before the packet is enqueued, and this could lead to DrainPacketQueue never being signaled.

Changes Made

Testing

Notes

@Copilot Copilot AI review requested due to automatic review settings August 27, 2025 17:18
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR replaces the problematic race condition-prone pattern of using a bounded SemaphoreSlim with count checking with an unbounded SemaphoreSlim in the NodeContext packet queue system. The change addresses thread safety issues that could occur when multiple threads attempt to enqueue packets simultaneously, particularly on ARM64 architectures with weaker memory models.

Key changes:

  • Replace AutoResetEvent and CancellationTokenSource with unbounded SemaphoreSlim for packet queue signaling
  • Convert packet draining from dedicated thread to async Task-based approach
  • Add memory barriers around packet enqueue operations to ensure proper ordering
  • Refactor shutdown logic to use proper resource disposal patterns

@@ -789,30 +787,33 @@ public void SendData(INodePacket packet)
{
_exitPacketState = ExitPacketState.ExitPacketQueued;
}

Thread.MemoryBarrier();
Copy link
Preview

Copilot AI Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thread.MemoryBarrier() creates a full memory fence which can be expensive. Consider using Interlocked operations or volatile semantics if specific ordering guarantees are needed, as the .NET memory model already provides sufficient guarantees for most scenarios.

Copilot generated this review using guidance from repository custom instructions.

_packetWriteQueue.Enqueue(packet);
_packetEnqueued.Set();

Thread.MemoryBarrier();
Copy link
Preview

Copilot AI Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second Thread.MemoryBarrier() call may be unnecessary. The SemaphoreSlim.Release() method already provides appropriate memory ordering guarantees for signaling the waiting thread.

Copilot generated this review using guidance from repository custom instructions.

using (var cts = new CancellationTokenSource(100))
{
await Task.WhenAny(_packetWriteDrainTask, Task.Delay(100, cts.Token));
cts.Cancel();
Copy link
Preview

Copilot AI Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling cts.Cancel() after the Task.WhenAny completes is unnecessary and potentially problematic. The CancellationTokenSource is already disposed in the using block, and cancellation should only be called if you need to interrupt the delay task early.

Suggested change
cts.Cancel();

Copilot uses AI. Check for mistakes.

Comment on lines +862 to +866

if (packet is NodeBuildComplete)
{
return;
}
Copy link
Preview

Copilot AI Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition duplicates exit logic that's already handled by IsExitPacket(). Consider consolidating the exit conditions or clearly documenting why NodeBuildComplete requires separate handling from other exit packets.

Suggested change
if (packet is NodeBuildComplete)
{
return;
}
// If NodeBuildComplete packets with PrepareForReuse == true require special handling,
// document the reason here. Otherwise, rely on IsExitPacket for all exit logic.

Copilot uses AI. Check for mistakes.

@SimaTian
Copy link
Member

SimaTian commented Sep 1, 2025

This is a non-trivial change and we already got burned. We will dig into it deeper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants