Skip to content

KeyNotFoundException in DistributedData Replicator.Notify() due to inconsistent subscription state #7804

@Aaronontheweb

Description

@Aaronontheweb

Version Information

  • Akka.NET version: 1.5.38 (also affects current main branch)
  • Platform: .NET (all platforms)

Describe the bug

The method throws a when trying to access due to inconsistent state between the subscription tracking collections.

User-Observed State Inconsistency

As reported by the user who encountered this issue:

"As far as I could see, in ReceiveFlushChanges() _subscribers, as well as _changed has entries for the key 'skill-statuses', but _subscriptionKeys does not. That's why var key = _subscriptionKeys[keyId]; fails in Notify()"

This reveals the core issue: inconsistent state where:

  • contains the key
  • contains the key
  • does NOT contain the key

Root Cause Analysis

Problematic code location:

The issue appears to be in the boolean logic of the unsubscribe cleanup condition. The OR condition can remove entries from even when active subscribers still exist in .

Timeline of the bug:

  1. Actors subscribe to a DData key (e.g., "skill-statuses")
  2. processes new subscribers, moving them from to and clearing
  3. Some (but not all) actors unsubscribe from the key
  4. evaluates the OR condition:
  • is (subscribers still exist)
  • is (was cleared earlier)
  • The OR condition evaluates to , incorrectly removing the key from
  1. Later, gossip updates or data changes add the key to (because still contains it)
  2. iterates over and calls
  3. tries to access but the key was incorrectly removed

To Reproduce

Difficult to reproduce consistently - it's a timing-dependent issue that requires:

  1. Multiple actors subscribing to the same DData key
  2. Subscription processing (FlushChanges) that moves subscribers from to
  3. Partial unsubscription (some but not all subscribers unsubscribe)
  4. Subsequent data updates that trigger change notifications

Example user code that triggers this:

Expected behavior

The Replicator should maintain consistent state between , , and collections, and should not throw KeyNotFoundException during normal operation.

Additional context

  • This is a critical production issue that causes Replicator actors to crash
  • The issue is intermittent and timing-dependent, making it hard to reproduce in tests
  • User is running long-lived distributed systems where this inconsistent state eventually surfaces
  • The user observation of the internal state proves the collections are getting out of sync

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions