-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Version Information
- Akka.NET version: 1.5.38 (also affects current main branch)
- Platform: .NET (all platforms)
Describe the bug
The method throws a when trying to access due to inconsistent state between the subscription tracking collections.
User-Observed State Inconsistency
As reported by the user who encountered this issue:
"As far as I could see, in ReceiveFlushChanges() _subscribers, as well as _changed has entries for the key 'skill-statuses', but _subscriptionKeys does not. That's why var key = _subscriptionKeys[keyId]; fails in Notify()"
This reveals the core issue: inconsistent state where:
- contains the key
- contains the key
- does NOT contain the key
Root Cause Analysis
Problematic code location:
The issue appears to be in the boolean logic of the unsubscribe cleanup condition. The OR condition can remove entries from even when active subscribers still exist in .
Timeline of the bug:
- Actors subscribe to a DData key (e.g., "skill-statuses")
- processes new subscribers, moving them from to and clearing
- Some (but not all) actors unsubscribe from the key
- evaluates the OR condition:
- is (subscribers still exist)
- is (was cleared earlier)
- The OR condition evaluates to , incorrectly removing the key from
- Later, gossip updates or data changes add the key to (because still contains it)
- iterates over and calls
- tries to access but the key was incorrectly removed
To Reproduce
Difficult to reproduce consistently - it's a timing-dependent issue that requires:
- Multiple actors subscribing to the same DData key
- Subscription processing (FlushChanges) that moves subscribers from to
- Partial unsubscription (some but not all subscribers unsubscribe)
- Subsequent data updates that trigger change notifications
Example user code that triggers this:
Expected behavior
The Replicator should maintain consistent state between , , and collections, and should not throw KeyNotFoundException during normal operation.
Additional context
- This is a critical production issue that causes Replicator actors to crash
- The issue is intermittent and timing-dependent, making it hard to reproduce in tests
- User is running long-lived distributed systems where this inconsistent state eventually surfaces
- The user observation of the internal state proves the collections are getting out of sync