Skip to content

Commit 4bdb093

Browse files
authored
Merge pull request #2388 from rkouznetsov/lossydoc
Add clarification for the meaning of NSB
2 parents 4c28990 + b55211b commit 4bdb093

File tree

1 file changed

+93
-1
lines changed

1 file changed

+93
-1
lines changed

docs/filters.md

Lines changed: 93 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -611,7 +611,99 @@ As part of its testing, the NetCDF build process creates a number of shared libr
611611
If you need a filter from that set, you may be able to set *HDF5\_PLUGIN\_PATH*
612612
to point to that directory or you may be able to copy the shared libraries out of that directory to your own location.
613613

614-
## Debugging {#filters_debug}
614+
# Lossy One-Way Filters
615+
616+
As of NetCDF version 4.8.2, the netcdf-c library supports
617+
bit-grooming filters.
618+
````
619+
Bit-grooming is a lossy compression algorithm that removes the
620+
bloat due to false-precision, those bits and bytes beyond the
621+
meaningful precision of the data. Bit Grooming is statistically
622+
unbiased, applies to all floating point numbers, and is easy to
623+
use. Bit-Grooming reduces data storage requirements by
624+
25-80%. Unlike its best-known competitor Linear Packing, Bit
625+
Grooming imposes no software overhead on users, and guarantees
626+
its precision throughout the whole floating point range
627+
[https://doi.org/10.5194/gmd-9-3199-2016].
628+
````
629+
The generic term "quantize" is used to refer collectively to the various
630+
precision-trimming algorithms. The key thing to note about quantization is that
631+
it occurs at the point of writing of data only. Since its output is
632+
legal data, it does not need to be "de-quantized" when the data is read.
633+
Because of this, quantization is not part of the standard filter
634+
mechanism and has a separate API.
635+
636+
The API for bit-groom is currently as follows.
637+
````
638+
int nc_def_var_quantize(int ncid, int varid, int quantize_mode, int nsd);
639+
int nc_inq_var_quantize(int ncid, int varid, int *quantize_modep, int *nsdp);
640+
````
641+
The *quantize_mode* argument specifies the particular algorithm.
642+
Currently, three are supported: NC_QUANTIZE_BITGROOM, NC_QUANTIZE_GRANULARBR,
643+
and NC_QUANTIZE_BITROUND. In addition quantization can be disabled using
644+
the value NC_NOQUANTIZE.
645+
646+
The input to ncgen or the output from ncdump supports special attributes
647+
to indicate if quantization was applied to a given variable.
648+
These attributes have the following form.
649+
````
650+
_QuantizeBitGroomNumberOfSignificantDigits = <NSD>
651+
or
652+
_QuantizeGranularBitRoundNumberOfSignificantDigits = <NSD>
653+
or
654+
_QuantizeBitRoundNumberOfSignificantBits = <NSB>
655+
````
656+
The value NSD is the number of significant (decimal) digits to keep.
657+
The value NSB is the number of bits to keep in the fraction part of an
658+
IEEE754 floating-point number. Note that NSB of QuantizeBitRound is the same as
659+
"number of explicit mantissa bits" (https://doi.org/10.5194/gmd-9-3199-2016) and same as
660+
the number of "keep-bits" (https://doi.org/10.5194/gmd-14-377-2021), but is not
661+
one less than the number of significant bunary figures:
662+
`_QuantizeBitRoundNumberOfSignificantBits = 0` means one significant binary figure,
663+
`_QuantizeBitRoundNumberOfSignificantBits = 1` means two significant binary figures etc.
664+
665+
## Distortions introduced by lossy filters
666+
667+
Any lossy filter introduces distortions to data.
668+
The lossy filters implemented in netcdf-c introduce a distortoin
669+
that can be quantified in terms of a _relative_ error. The magnitude of
670+
distortion introduced to every single value V is guaranteed to be within
671+
a certain fraction of V, expressed as 0.5 * V * 2**{-NSB}:
672+
i.e. it is 0.5V for NSB=0, 0.25V for NSB=1, 0.125V for NSB=2 etc.
673+
674+
675+
Two other methods use different definitions of _decimal precision_, though both
676+
are guaranteed to reproduce NSD decimals when printed.
677+
The margin for a relative error introduced by the methods are summarised in the table
678+
679+
```
680+
NSD 1 2 3 4 5 6 7
681+
682+
BitGroom
683+
Error Margin 3.1e-2 3.9e-3 4.9e-4 3.1e-5 3.8e-6 4.7e-7 -
684+
685+
GranularBitRound
686+
Error Margin 1.4e-1 1.9e-2 2.2e-3 1.4e-4 1.8e-5 2.2e-6 -
687+
688+
```
689+
690+
691+
If one defines decimal precision as in BitGroom, i.e. the introduced relative
692+
error must not exceed half of the unit at the decimal place NSD in the
693+
worst-case scenario, the following values of NSB should be used for BitRound:
694+
695+
```
696+
NSD 1 2 3 4 5 6 7
697+
NSB 3 6 9 13 16 19 23
698+
```
699+
700+
The resulting application of BitRound is as fast as BitGroom, and is free from
701+
artifacts in multipoint statistics introduced by BitGroom
702+
(see https://doi.org/10.5194/gmd-14-377-2021).
703+
704+
705+
# Debugging {#filters_debug}
706+
615707

616708
Depending on the debugger one uses, debugging plugins can be very difficult.
617709
It may be necessary to use the old printf approach for debugging the filter itself.

0 commit comments

Comments
 (0)