Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
## Version 0.7.0 (in development)

* Made writing custom slice sources easier: (#82)

- Slice items can now be a `contextlib.AbstractContextManager`
so custom slice functions can now be used with
[@contextlib.contextmanager](https://docs.python.org/3/library/contextlib.html#contextlib.contextmanager).

- Introduced `SliceSource.close()` so
[contextlib.closing()](https://docs.python.org/3/library/contextlib.html#contextlib.closing)
is applicable. Deprecated `SliceSource.dispose()`.

## Version 0.6.0 (from 2024-03-12)

### Enhancements
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,8 @@ The `zappend` tool provides the following features:
[`zappend`](cli.md) command or from Python. When used from Python using the
[`zappend()`](api.md) function, slice datasets can be passed as local file
paths, URIs, as datasets of type
[xarray.Dataset](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html), or as custom
[zappend.api.SliceSource](https://bcdev.github.io/zappend/api/#class-slicesource) objects.
[xarray.Dataset](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html), or as custom
[slice sources](https://bcdev.github.io/zappend/guide/#slice-sources).


More about zappend can be found in its
Expand Down
2 changes: 1 addition & 1 deletion docs/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@ Options for the filesystem given by the URI of `target_dir`.
## `slice_source`

Type _string_.
The fully qualified name of a class or function that provides a slice source for each slice item. If a class is given, it must be derived from `zappend.api.SliceSource`. If a function is given, it must return an instance of `zappend.api.SliceSource`. Refer to the user guide for more information.
The fully qualified name of a class or function that receives a slice item as argument(s) and provides the slice dataset. If a class is given, it must be derived from `zappend.api.SliceSource`. If the function is a context manager, it must yield an `xarray.Dataset`. If a plain function is given, it must return any valid slice item type. Refer to the user guide for more information.

## `slice_engine`

Expand Down
47 changes: 42 additions & 5 deletions docs/guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -694,10 +694,14 @@ at the cost of additional i/o. It therefore defaults to `false`.

If you need some custom cleanup after a slice has been processed and appended to the
target dataset, you can use instances of `zappend.api.SliceSource` as slice items.
A `SliceSource` class requires you to implement two methods:
The `SliceSource` methods with special meaning are:

* `get_dataset()` to return the slice dataset of type `xarray.Dataset`, and
* `dispose()` to perform any resource cleanup tasks.
* `get_dataset()`: a zero-argument method that returns the slice dataset of type
`xarray.Dataset`. You must implement this abstract method.
* `close()`: perform any resource cleanup tasks
(in zappend < v0.7, the `close` method was called `dispose`).
* `__init__()`: optional constructor that receives any arguments passed to the
slice source.

Here is the template code for your own slice source implementation:

Expand All @@ -718,7 +722,7 @@ class MySliceSource(SliceSource):
# You can put any processing here.
return self.ds

def dispose(self):
def close(self):
# Write code here that performs cleanup.
if self.ds is not None:
self.ds.close()
Expand Down Expand Up @@ -795,7 +799,7 @@ class MySliceSource(SliceSource):
self.ds = xr.open_dataset(self.slice_path)
return get_mean_slice(self.ds)

def dispose(self):
def close(self):
if self.ds is not None:
self.ds.close()
self.ds = None
Expand All @@ -805,6 +809,39 @@ zappend(["slice-1.nc", "slice-2.nc", "slice-3.nc"],
slice_source=MySliceSource)
```

Since zappend 0.7, a slice source can also be written as a Python
[context manager](https://docs.python.org/3/library/contextlib.html),
which allows you implementing the `get_dataset()` and `close()`
methods in one single function, instead of a class. Here is the above example
written as context manager.

```python
from contextlib import contextmanager
import numpy as np
import xarray as xr
from zappend.api import zappend

# Same as above here

@contextmanager
def get_slice_dataset(slice_path):
# allocate resources here
ds = xr.open_dataset(slice_path)
mean_ds = get_mean_slice(ds)
try:
# yield (!) the slice dataset
# so it can be appended
yield mean_ds
finally:
# after slice dataset has been appended
# release resources here
ds.close()

zappend(["slice-1.nc", "slice-2.nc", "slice-3.nc"],
target_dir="target.zarr",
slice_source=get_slice_dataset)
```

## Profiling

Runtime profiling is very important for understanding program runtime behavior
Expand Down
4 changes: 2 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,8 @@ The `zappend` tool provides the following features:
[`zappend`](cli.md) command or from Python. When used from Python using the
[`zappend()`](api.md) function, slice datasets can be passed as local file
paths, URIs, as datasets of type
[xarray.Dataset](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html), or as custom
[zappend.api.SliceSource](https://bcdev.github.io/zappend/api/#class-slicesource) objects.
[xarray.Dataset](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html), or as custom
[slice sources](guide#slice-sources).

## How It Works

Expand Down
118 changes: 115 additions & 3 deletions tests/slice/test_cm.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
# Permissions are hereby granted under the terms of the MIT License:
# https://opensource.org/licenses/MIT.

import contextlib
import shutil
import unittest
import warnings
Expand All @@ -13,6 +14,7 @@
from zappend.fsutil.fileobj import FileObj
from zappend.slice.cm import SliceSourceContextManager
from zappend.slice.cm import open_slice_dataset
from zappend.slice.source import SliceSource
from zappend.slice.sources.memory import MemorySliceSource
from zappend.slice.sources.persistent import PersistentSliceSource
from zappend.slice.sources.temporary import TemporarySliceSource
Expand All @@ -23,17 +25,17 @@
# noinspection PyUnusedLocal


# noinspection PyShadowingBuiltins
# noinspection PyShadowingBuiltins,PyRedeclaration
class OpenSliceDatasetTest(unittest.TestCase):
def setUp(self):
clear_memory_fs()

# noinspection PyMethodMayBeStatic
def test_slice_item_is_slice_source(self):
dataset = make_test_dataset()
ctx = Context(dict(target_dir="memory://target.zarr"))
slice_item = MemorySliceSource(dataset, 0)
slice_cm = open_slice_dataset(ctx, slice_item)
self.assertIsInstance(slice_cm, SliceSourceContextManager)
self.assertIs(slice_item, slice_cm.slice_source)

def test_slice_item_is_dataset(self):
Expand Down Expand Up @@ -127,7 +129,6 @@ def test_slice_item_is_uri_with_polling_ok(self):
with slice_cm as slice_ds:
self.assertIsInstance(slice_ds, xr.Dataset)

# noinspection PyMethodMayBeStatic
def test_slice_item_is_uri_with_polling_fail(self):
slice_dir = FileObj("memory://slice.zarr")
ctx = Context(
Expand All @@ -140,3 +141,114 @@ def test_slice_item_is_uri_with_polling_fail(self):
with pytest.raises(FileNotFoundError, match=slice_dir.uri):
with slice_cm:
pass

def test_slice_item_is_context_manager(self):
@contextlib.contextmanager
def get_dataset(name):
uri = f"memory://{name}.zarr"
ds = make_test_dataset(uri=uri)
try:
yield ds
finally:
ds.close()
FileObj(uri).delete(recursive=True)

ctx = Context(
dict(
target_dir="memory://target.zarr",
slice_source=get_dataset,
)
)
slice_cm = open_slice_dataset(ctx, "bibo")
self.assertIsInstance(slice_cm, contextlib.AbstractContextManager)
with slice_cm as slice_ds:
self.assertIsInstance(slice_ds, xr.Dataset)

def test_slice_item_is_slice_source(self):
class MySliceSource(SliceSource):
def __init__(self, name):
self.uri = f"memory://{name}.zarr"
self.ds = None

def get_dataset(self):
self.ds = make_test_dataset(uri=self.uri)
return self.ds

def close(self):
if self.ds is not None:
self.ds.close()
FileObj(uri=self.uri).delete(recursive=True)

ctx = Context(
dict(
target_dir="memory://target.zarr",
slice_source=MySliceSource,
)
)
slice_cm = open_slice_dataset(ctx, "bibo")
self.assertIsInstance(slice_cm, SliceSourceContextManager)
self.assertIsInstance(slice_cm.slice_source, SliceSource)
with slice_cm as slice_ds:
self.assertIsInstance(slice_ds, xr.Dataset)

def test_slice_item_is_deprecated_slice_source(self):
class MySliceSource(SliceSource):
def __init__(self, name):
self.uri = f"memory://{name}.zarr"
self.ds = None

def get_dataset(self):
self.ds = make_test_dataset(uri=self.uri)
return self.ds

def dispose(self):
if self.ds is not None:
self.ds.close()
FileObj(uri=self.uri).delete(recursive=True)

ctx = Context(
dict(
target_dir="memory://target.zarr",
slice_source=MySliceSource,
)
)
slice_cm = open_slice_dataset(ctx, "bibo")
self.assertIsInstance(slice_cm, SliceSourceContextManager)
self.assertIsInstance(slice_cm.slice_source, SliceSource)
with pytest.warns(expected_warning=DeprecationWarning):
with slice_cm as slice_ds:
self.assertIsInstance(slice_ds, xr.Dataset)


class IsContextManagerTest(unittest.TestCase):
"""Assert that context managers are identified by isinstance()"""

def test_context_manager_class(self):
@contextlib.contextmanager
def my_slice_source(data):
ds = xr.Dataset(data)
try:
yield ds
finally:
ds.close()

item = my_slice_source([1, 2, 3])
self.assertTrue(isinstance(item, contextlib.AbstractContextManager))
self.assertFalse(isinstance(my_slice_source, contextlib.AbstractContextManager))

def test_context_manager_protocol(self):
class MySliceSource:
def __enter__(self):
return xr.Dataset()

def __exit__(self, *exc):
pass

item = MySliceSource()
self.assertTrue(isinstance(item, contextlib.AbstractContextManager))
self.assertFalse(isinstance(MySliceSource, contextlib.AbstractContextManager))

def test_dataset(self):
item = xr.Dataset()
self.assertTrue(isinstance(item, contextlib.AbstractContextManager))
self.assertFalse(isinstance(xr.Dataset, contextlib.AbstractContextManager))
38 changes: 28 additions & 10 deletions tests/slice/test_source.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,19 @@
# Permissions are hereby granted under the terms of the MIT License:
# https://opensource.org/licenses/MIT.

import shutil
import unittest
import warnings

import pytest
import xarray as xr

from zappend.context import Context
from zappend.fsutil.fileobj import FileObj
from zappend.slice.cm import SliceSourceContextManager
from zappend.slice.cm import open_slice_dataset
from zappend.slice.source import to_slice_source, SliceSource
from zappend.slice.source import SliceSource
from zappend.slice.source import to_slice_source
from zappend.slice.sources.memory import MemorySliceSource
from zappend.slice.sources.persistent import PersistentSliceSource
from zappend.slice.sources.temporary import TemporarySliceSource
from tests.helpers import clear_memory_fs
from tests.helpers import make_test_dataset
from tests.config.test_config import CustomSliceSource


# noinspection PyUnusedLocal
Expand Down Expand Up @@ -86,22 +81,43 @@ def my_slice_source(arg1, arg2=None, ctx=None):
return xr.Dataset(attrs=dict(arg1=arg1, arg2=arg2, ctx=ctx))

ctx = make_ctx(slice_source=my_slice_source)
arg = xr.Dataset()
slice_source = to_slice_source(ctx, ([13], {"arg2": True}), 0)
self.assertIsInstance(slice_source, MemorySliceSource)
ds = slice_source.get_dataset()
self.assertEqual(13, ds.attrs.get("arg1"))
self.assertEqual(True, ds.attrs.get("arg2"))
self.assertIs(ctx, ds.attrs.get("ctx"))

def test_slice_item_is_slice_source_context_manager(self):
import contextlib

@contextlib.contextmanager
def my_slice_source(ctx, arg1, arg2=None):
_ds = xr.Dataset(attrs=dict(arg1=arg1, arg2=arg2, ctx=ctx))
try:
yield _ds
finally:
_ds.close()

ctx = make_ctx(slice_source=my_slice_source)
slice_source = to_slice_source(ctx, ([14], {"arg2": "OK"}), 0)
self.assertIsInstance(slice_source, contextlib.AbstractContextManager)
with slice_source as ds:
self.assertIsInstance(ds, xr.Dataset)
self.assertEqual(14, ds.attrs.get("arg1"))
self.assertEqual("OK", ds.attrs.get("arg2"))
self.assertIs(ctx, ds.attrs.get("ctx"))

# noinspection PyMethodMayBeStatic
def test_raises_if_slice_item_is_int(self):
ctx = make_ctx(persist_mem_slices=True)
with pytest.raises(
TypeError,
match=(
"slice_item must have type str, xarray.Dataset,"
" zappend.api.FileObj, zappend.api.SliceSource, but was type int"
" contextlib.AbstractContextManager,"
" zappend.api.FileObj, zappend.api.SliceSource,"
" but was type int"
),
):
to_slice_source(ctx, 42, 0)
Expand All @@ -116,7 +132,9 @@ def hallo():
TypeError,
match=(
"slice_item must have type str, xarray.Dataset,"
" zappend.api.FileObj, zappend.api.SliceSource, but was type function"
" contextlib.AbstractContextManager,"
" zappend.api.FileObj, zappend.api.SliceSource,"
" but was type function"
),
):
to_slice_source(ctx, hallo, 0)
Loading