Ceph stress testing runs into "icechunk.IcechunkError: × session error: error writing object to object store service error"

**What happened?**

When trying to run highly parallel code against a [Ceph](https://ceph.io/en/)  S3-compatible object store with Icechunk, the program exited with the following error message:

> multiprocessing.pool.RemoteTraceback: 
> """
> Traceback (most recent call last):
>   File "/usr/lib64/python3.13/multiprocessing/pool.py", line 125, in worker
>     result = (True, func(*args, **kwds))
>                     ~~~~^^^^^^^^^^^^^^^
>   File "/usr/lib64/python3.13/multiprocessing/pool.py", line 51, in starmapstar
>     return list(itertools.starmap(args[0], args[1]))
>   File "/home/username/icechunk-demo/demo/demo06_icechunk_parallel_partial_write.py", line 100, in worker
>     session.commit(message)
>     ~~~~~~~~~~~~~~^^^^^^^^^
>   File "/home/username/.cache/pypoetry/virtualenvs/icechunk-demo-ZY7yPLKK-py3.13/lib64/python3.13/site-packages/icechunk/session.py", line 271, in commit
>     return self._session.commit(
>            ~~~~~~~~~~~~~~~~~~~~^
>         message, metadata, rebase_with=rebase_with, rebase_tries=rebase_tries
>         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>     )
>     ^
> icechunk.IcechunkError:   × session error: error writing object to object store service error
>   │ 
>   │ context:
>   │    0: icechunk::storage::s3::write_ref
>   │            with ref_key="branch.main/ref.json" previous_version=VersionInfo { etag: Some(ETag("\"7a9369dcfbc377183fef8e9d8f0bf40e\"")), generation: None }
>   │              at icechunk/src/storage/s3.rs:819
>   │    1: icechunk::refs::update_branch
>   │            with name="main" new_snapshot=6C5N1DRCE5SP0CV9MJCG current_snapshot=Some(0GYPX3ZH15N48YG7V8K0)
>   │              at icechunk/src/refs.rs:173
>   │    2: icechunk::session::_commit
>   │            with Worker (init_time=18, lead_time_index=03): Wrote chunk data. rewrite_manifests=false
>   │              at icechunk/src/session.rs:985
>   │    3: icechunk::session::commit
>   │            with Worker (init_time=18, lead_time_index=03): Wrote chunk data.
>   │              at icechunk/src/session.rs:948
>   │ 
> 
> """
> 
> The above exception was the direct cause of the following exception:
> 
> Traceback (most recent call last):
>   File "/home/username/icechunk-demo/demo/demo06_icechunk_parallel_partial_write.py", line 199, in <module>
>     main()
>     ~~~~^^
>   File "/home/username/icechunk-demo/demo/demo06_icechunk_parallel_partial_write.py", line 168, in main
>     stats: list[tuple[int, int, int, int, float, float, float]] = pool.starmap(worker, tasks)
>                                                                   ~~~~~~~~~~~~^^^^^^^^^^^^^^^
>   File "/usr/lib64/python3.13/multiprocessing/pool.py", line 375, in starmap
>     return self._map_async(func, iterable, starmapstar, chunksize).get()
>            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
>   File "/usr/lib64/python3.13/multiprocessing/pool.py", line 774, in get
>     raise self._value
> icechunk.IcechunkError:   × session error: error writing object to object store service error
>   │ 
>   │ context:
>   │    0: icechunk::storage::s3::write_ref
>   │            with ref_key="branch.main/ref.json" previous_version=VersionInfo { etag: Some(ETag("\"7a9369dcfbc377183fef8e9d8f0bf40e\"")), generation: None }
>   │              at icechunk/src/storage/s3.rs:819
>   │    1: icechunk::refs::update_branch
>   │            with name="main" new_snapshot=6C5N1DRCE5SP0CV9MJCG current_snapshot=Some(0GYPX3ZH15N48YG7V8K0)
>   │              at icechunk/src/refs.rs:173
>   │    2: icechunk::session::_commit
>   │            with Worker (init_time=18, lead_time_index=03): Wrote chunk data. rewrite_manifests=false
>   │              at icechunk/src/session.rs:985
>   │    3: icechunk::session::commit
>   │            with Worker (init_time=18, lead_time_index=03): Wrote chunk data.
>   │              at icechunk/src/session.rs:948
>   │ 
> 

**What did you expect to happen?**
The commit should be successful with a Ceph Object Gateway backend.

**Minimal Complete Verifiable Example**
Given that I have a concrete idea of where things might not be working together, I have omitted the example. If you require it, just tell me.

**Anything else we need to know?**
I have analyzed the problem and found the following:

In this code (part of `write_ref` in `s3.rs`),

```
        match res {
            Ok(_) => Ok(WriteRefResult::Written),
            Err(err) => {
                let code = err.as_service_error().and_then(|e| e.code()).unwrap_or("");
                if code.contains("PreconditionFailed")
                    || code.contains("ConditionalRequestConflict")
                {
                    Ok(WriteRefResult::WontOverwrite)
                } else {
                    Err(Box::new(err).into())
                }
            }
        }
```
there are two error codes which cause the operation to retry.
While Ceph correctly returns _PreconditionFailed_ for HTTP 412, it uses _ConcurrentModification_ for HTTP 409 as verified with Wireshark:

<img width="581" height="180" alt="Image" src="https://github.com/user-attachments/assets/05f44bfd-800c-430d-882d-85feb2953ecc" />

This means that the code follows the "real error path" and aborts.

If you agree with this analysis, would it then be possible to add _ConcurrentModification_ to the or-condition in the mentioned code, or check more generally for generic 409 HTTP responses?

**Environment**
platform: Fedora Linux 42 (Workstation Edition) (6.15.10-200.fc42.x86_64)
python: 3.13.7
icechunk: 1.1.4
zarr: 3.1.0
numcodecs: 0.16.2
xarray: 2025.8.0

Thanks for the great tool!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ceph stress testing runs into "icechunk.IcechunkError: × session error: error writing object to object store service error" #1197

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ceph stress testing runs into "icechunk.IcechunkError: × session error: error writing object to object store service error" #1197

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions