-
Notifications
You must be signed in to change notification settings - Fork 2.1k
[Feature] Log-structured grain storage #9450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We have been using this library internally for a few proof-of-concept projects and I want it to be available via nuget in an experimental, alpha form so that we and others can experiment more with it, provide feedback, and help to mature the library. |
0cdc6eb
to
e8fe1ac
Compare
Does this mean if we have a collection as the state, this can be used to append to the collection without resending the entire collection over? e.g it would be represented as a proper list in redis persistence provider. Although for this case it would be useful to also 'remove' a single item from the collection (random, dequeue, pop based on collection type) |
Yes
No - this represents state as opaque log entries (like a database write-ahead log), not as DB-native data structures. It's possible that state could be represented as a single redis hash or list, but the contents would be opaque blobs. One alternative would be to replace the implementations of IDurableX in the container, but you'd need a fallback for unknown state machine types and types which don't have good representations in the target database. Then, during storage operations you would need to use a transaction to load all state or store all changes atomically. Conceptually doable, but not something I've been considering for this PR and probably not something which could be supported well. Why not use Redis APIs from your grain instead? |
I wonder what I miss doing that instead of orleans persistence for any kind of state actually. |
Mostly just convenience - Orleans loads the state for you during activation and manages concurrency using etags to prevent lost writes, etc. We often encourage people to go directly to their database when they have needs not satisfied by that simple persistence model. Edit: and it keeps the state in memory while the grain is active, allowing you to serve reads without needing to perform IO. When you manage storage in application code, you need to handle all of this yourself. It may not be a burden for some, but it adds complexity to the application as well as room for error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request adds an experimental log‐structured journaling framework for Orleans grain state persistence, along with a suite of durable state machine types and an Azure Append Blob storage provider for durable logging.
- Introduces a new Orleans.Journaling project that defines durable state machine interfaces and implementations (e.g. DurableValue, DurableList, DurableDictionary, etc.).
- Implements an Azure Append Blob storage provider for atomic persistence and recovery, together with hosting extensions and serialization improvements.
Reviewed Changes
Copilot reviewed 57 out of 59 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
HostingExtensions.cs | Registers scoped services for state machine storage and various durable types. |
DurableValue.cs, DurableTaskCompletionSource.cs, DurableState.cs | Implements durable state machine patterns with version checking and logging of state changes. |
DurableSet.cs, DurableQueue.cs, DurableList.cs, DurableDictionary.cs | Provides durable collection implementations using new C# collection initialization syntax. |
DurableGrain.cs | Provides a base class to support dependency injection and lifecycle management for durable state machines. |
Azure/... | Implements Azure Append Blob storage provider and related hosting extensions for journaling persistence. |
MigrationContext.cs | Updates disposal behavior to reset and clear buffer state. |
Files not reviewed (2)
- Orleans.sln: Language not supported
- src/Azure/Orleans.Journaling.AzureStorage/Orleans.Journaling.AzureStorage.csproj: Language not supported
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
/// Container name where state machine state is stored. | ||
/// </summary> | ||
public string ContainerName { get; set; } = DEFAULT_CONTAINER_NAME; | ||
public const string DEFAULT_CONTAINER_NAME = "state"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this container name have to be globally unique for the storage account? Should we prepend something here to avoid collisions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's configurable, so developers can configure it how they like (eg, prefixing it with a unique id, possibly based on ServiceId). They can also provide a factory via the BuildContainerFactory
property below to customize it on a per-grain basis.
They have options already, but we could set a different default value or potentially prefix it with the ServiceId automatically. What do you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a collision error will be obvious, I guess it's fine to leave as is.
{ | ||
throw new InvalidOperationException("Registering a state machine after activation is invalid"); | ||
/* | ||
// Re-enqueue the work item without completing it, while waiting for the state machine manager to be initialized. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we re-enqueue instead of throwing? With some mitigation for infinite looping...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll delete the comment in code. It's an enhancement we should implement in future (maybe before marking this as stable), but IMO we shouldn't block for it.
/// <summary> | ||
/// Resets this instance, returning all memory. | ||
/// </summary> | ||
public void Reset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this safe if there's an ArcBuffer that has a reference to this memory? We unpin the page and return it to the page pool but the ArcBuffer referencing it doesn't AFAICT know that the page may be being reused. The version check is only used during pin/unpin but not on access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unpin decrements the ref count, but if there is an ArcBufferSlice that references a page, it will be keeping the ref count above zero, so this is safe (sans implementation bugs).
} | ||
|
||
/// <inheritdoc/> | ||
public void Dispose() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we concerned that this doesn't actually release pinned pages if there's outstanding references? I guess the class of problem here is similar to the other comment; the writer has no insight into how many ArcBuffers were issued or whether they were released yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the expected behavior. You write, someone peeks/consumes slices, you dispose, eventually (maybe a long time after, and maybe the buffer is sliced repeatedly) all slices are disposed and the pages are returned. The reason we use atomic ref counting is to separate the slice lifetime from the buffer writer lifetime.
Note that if you use this, you may hit dotnet/runtime#93438 unless you:
|
Summary:
This pull request introduces an experimental framework for log-structured grain state persistence in Orleans, based on the replicated state machine approach. This approach focuses on appending state changes to a log rather than overwriting full state, offering potential benefits for state types with high write frequency or complex structures. A key advantage is the ability to perform atomic updates across all durable state managed by a single grain activation. The PR includes the core journaling infrastructure, several powerful built-in durable collection types and state wrappers (including
IDurableValue<T>
), and an initial Azure Append Blob storage provider. A volatile in-memory provider is also included for development and testing.Background:
This directly implements the proposal outlined in issue #7691. Traditional grain persistence often involves reading and writing the entire grain state for one or more
IPersistentState<T>
instances. This can become inefficient for large or frequently updated state objects, and crucially, updates to multipleIPersistentState<T>
instances within a single grain are not atomic with respect to each other.The log-structured approach provides an alternative model:
WriteStateAsync
triggers the persistence of all pending log entries across all managed DSMs as a single, atomic unit to durable storage.Beyond atomic updates and potential performance benefits for certain workloads, this pattern also offers significant extensibility:
IDurableStateMachine
implementations for specialized state requirements.Implementation Details:
This PR delivers the core components required for this pattern:
Orleans.Journaling Project: This is the central library defining the pattern and providing the core state management logic and built-in types.
IDurableStateMachine
,IStateMachineManager
(manages multiple state machines within a grain activation), andIStateMachineStorage
(the interface for interacting with durable storage).StateMachineManager
: Implements the core logic for log recovery (replaying entries/snapshots), applying log entries and snapshots, buffering pending writes in memory (LogExtentBuilder
), and coordinating atomic persistence operations (appending new log segments or writing full snapshots) via theIStateMachineStorage
. It operates asynchronously in a dedicated work loop.IStateMachineLogWriter
interface (provided by theStateMachineManager
):IDurableDictionary<K, V>
IDurableList<T>
IDurableQueue<T>
IDurableSet<T>
IDurableValue<T>
(simple scalar value wrapper)IPersistentState<T>
(Implemented byDurableState<T>
, providing compatibility with the existing grain persistence programming model for single values, backed by journaling).IDurableTaskCompletionSource<T>
(for persisting the outcome of asynchronous operations whose state needs durability).IDurableNothing
(a placeholder state machine that performs no operations, useful for marking state machines as retired).DurableGrain
Base Class: A convenient base class that automatically handles dependency injection and lifecycle management for theIStateMachineManager
, simplifying grain implementation. It provides theWriteStateAsync()
method.VolatileStateMachineStorageProvider
: An in-memory provider that implementsIStateMachineStorage
but does not provide cross-process durability. Ideal for testing or scenarios where persistence isn't required..AddStateMachineStorage()
) and makes the built-in durable types available via keyed DI.Orleans.Journaling.AzureStorage Project: This library provides a concrete durable storage implementation for the journaling framework.
AzureAppendBlobStateMachineStorageProvider
: ImplementsIStateMachineStorageProvider
using Azure Append Blobs. Append blobs are a natural fit for storing the sequential log segments efficiently.AzureAppendBlobStateMachineStorageOptions
for connection details and container configuration, and hosting extensions (.AddAzureAppendBlobStateMachineStorage()
) for easy setup.Serialization Layer Improvements:
ArcBufferWriter
, an efficient pooled buffer implementation optimized for building the log segments (LogExtentBuilder
), enabling fast in-memory buffering and efficient writing to storage backends. Based on atomic reference counting, with 'slices' taking leases.Usage Examples:
Using the new journaling persistence involves configuring a storage provider and then injecting and using the durable types within a grain inheriting from
DurableGrain
.1. Silo Configuration (Example with Azure Append Blob Storage):
Add necessary NuGet packages:
Microsoft.Orleans.Journaling
,Microsoft.Orleans.Journaling.AzureStorage
.2. Grain Definition and State Injection:
Define your state using the built-in durable types or custom
IDurableStateMachine
implementations. Inherit fromDurableGrain
.Testing:
Comprehensive unit and integration tests are included in the
Orleans.Journaling.Tests
project. These tests cover:StateMachineManager
logic, including log recovery and write coordination.IDurableDictionary
,IDurableList
,IDurableQueue
,IDurableSet
,IDurableValue
,DurableState
(viaIPersistentState
), etc.) to ensure their operations are correctly logged and replayed.VolatileStateMachineStorageProvider
and theAzureAppendBlobStateMachineStorageProvider
, including scenarios with multiple operations, large state, and atomic updates across multiple state machines within a grain.Note: As mentioned, this feature is introduced as experimental (
ORLEANSEXP005
). The APIs, implementation details, and performance characteristics may evolve based on feedback and further development.Open Points / Future Work: