Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,34 @@

### Enhancements

* The configuration setting `attrs` can now be used to define dynamically
computed dataset attributes using the syntax `{{ expression }}`. [#60]

Example:
```yaml
permit_eval: true
attrs:
title: HROC Ocean Colour Monthly Composite
time_coverage_start: {{ lower_bound(ds.time) }}
time_coverage_end: {{ upper_bound(ds.time) }}
```

* Introduced new configuration setting `attrs_update_mode` that controls
how dataset attributes are updated. [#59]

* Simplified logging to console. You can now set configuration setting `logging`
to a log level which will implicitly enable console logging with given log
level. [#64]

* Added a section in the notebook `examples/zappend-demo.ipynb`
that demonstrates transaction rollbacks.


* Added CLI option `--traceback`. [#57]

* Added a section in the notebook `examples/zappend-demo.ipynb`
that demonstrates transaction rollbacks.


## Version 0.4.1 (from 2024-02-13)

Expand Down
30 changes: 30 additions & 0 deletions docs/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,36 @@ Variable metadata.
Type _object_.
Arbitrary variable metadata attributes.

## `attrs`

Type _object_.
Arbitrary dataset attributes. If `permit_eval` is set to `true`, string values may include Python expressions enclosed in `{{` and `}}` to dynamically compute attribute values; in the expression, the current dataset is named `ds`. Refer to the user guide for more information.

## `attrs_update_mode`

The mode used update target attributes from slice attributes. Independently of this setting, extra attributes configured by the `attrs` setting will finally be used to update the resulting target attributes.
Must be one of the following:

* Use attributes from first slice dataset and keep them.
Its value is `"keep"`.

* Replace existing attributes by attributes of last slice dataset.
Its value is `"replace"`.

* Update existing attributes by attributes of last slice dataset.
Its value is `"update"`.

* Ignore attributes from slice datasets.
Its value is `"ignore"`.

Defaults to `"keep"`.

## `permit_eval`

Type _boolean_.
Allow for dynamically computed values in dataset attributes `attrs` using the syntax `{{ expression }}`. Executing arbitrary Python expressions is a security risk, therefore this must be explicitly enabled. Refer to the user guide for more information.
Defaults to `false`.

## `target_dir`

Type _string_.
Expand Down
83 changes: 82 additions & 1 deletion docs/guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,9 @@ This remainder of this guide explains the how to use the various `zappend`
variables. A variable comprises the actual data array as well as metadata describing
the data dimensions, units, and encoding, such as chunking and compression.

## Dataset Outline
## Dataset Metadata

### Outline

If no further configuration is supplied, then the target dataset's outline and data
encoding is fully prescribed by the first slice dataset provided. By default, the
Expand Down Expand Up @@ -152,6 +154,85 @@ Often, it is easier to specify which variables should be excluded:
"excluded_variables": ["GridCellId"]
}
```
### Attributes

The target dataset should exploit information about itself using global
metadata attributes.
There are three choices to update the global attributes of the target
dataset from slices. The configuration setting `attrs_update_mode`
controls how this is done:

* `"keep"` - use attributes from first slice dataset and keep them (default);
* `"replace"` - replace existing attributes by attributes of last slice dataset;
* `"update"` - update existing attributes by attributes of last slice dataset;
* `"ignore"` - ignore attributes from slice datasets.

Extra attributes can be added using the optional configuration setting `attrs`:

```json
{
"attrs_update_mode": "keep",
"attrs": {
"Conventions": "CF-1.10",
"title": "SMOS Level 2C Soil Moisture 2-Days Composite"
}
}
```

Independently of the `attrs_update_mode` setting, extra attributes configured
by the `attrs` setting will always be used to update the resulting target
attributes.

Attribute values in the `attrs` setting may also be computed dynamically using
the syntax `{{ expression }}`, where `expression` is an arbitrary Python
expression. For this to work, the setting `permit_eval` must be explicitly
set for security reasons:

```json
{
"permit_eval": true,
"attrs_update_mode": "keep",
"attrs": {
"time_coverage_start": "{{ ds.time[0] }}",
"time_coverage_end": "{{ ds.time[-1] }}"
}
}
```

Currently, the only variable accessible from expressions is `ds` which is
a reference to the current state of the target dataset after the last slice
append. It is of type
[xarray.Dataset](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html).

!!! danger "Evil eval()"
The expressions in `{{ expression }}` are evaluated using the Python
[eval() function](https://docs.python.org/3/library/functions.html#eval).
This can pose a threat to your application and environment.
Although `zappend` does not allow you to directly access Python built-in
functions via expressions, it should be used judiciously and with extreme
caution if used as part of a web service where configuration is injected
from the outside of your network.

The following utility functions can be used as well and are handy if you need
to store the upper and lower bounds of coordinates as attribute values:

* `lower_bound(array, ref: "lower"|"upper"|"center" = "lower")`:
Return the lower bound of a one-dimensional (coordinate) array `array`.
* `upper_bound(array, ref: "lower"|"upper"|"center" = "lower")`:
Return the upper bound of a one-dimensional (coordinate) array `array`.

The `ref` value specifies the reference within an array element that is used
as a basis for the boundary computation. E.g., if coordinate labels refer to
array element centers, pass `ref="center"`.

```json
{
"attrs": {
"time_coverage_start": "{{ lower_bound(ds.time, 'center') }}",
"time_coverage_end": "{{ upper_bound(ds.time, 'center') }}"
}
}
```

## Variable Metadata

Expand Down
Empty file added tests/config/__init__.py
Empty file.
Loading