stems.io.encoding module¶

stems.io.encoding.encoding_chunksizes(xarr, chunks=None)[source]¶

Find/resolve chunksize for a DataArray

Parameters

xarr (xarray.DataArray) – DataArray to consider
chunks (tuple[int] or Mapping[str, int]) – Chunks per dimension

Returns

Chunksizes per dimension

Return type

tuple[int]

stems.io.encoding.encoding_dtype(xarr)[source]¶

Get dtype encoding info

Parameters: xarr (xarray.DataArray or np.ndarray) – DataArray to consider
Returns: Datatype information for encoding (e.g., {'dtype': np.float32})
Return type: dict[str, np.dtype]

stems.io.encoding.encoding_name(xarr)[source]¶

Return the name of the variable to provide encoding for

Either returns the name of the DataArray, or the name that XArray will assign it when writing to disk (xarray.backends.api.DATAARRAY_VARIABLE).

Parameters: xarr (xarray.DataArray) – Provide the name of this DataArray used for encoding
Returns: Encoding variable name
Return type: str

stems.io.encoding.guard_chunksizes(xarr, chunksizes)[source]¶

Guard chunksize to be <= dimension sizes

Parameters

xarr (xarray.DataArray) – DataArray to consider
chunksizes (tuple[int]) – Chunks per dimension

Returns

Guarded chunksizes

Return type

tuple[int]

stems.io.encoding.guard_chunksizes_str(xarr, chunksizes)[source]¶

Guard chunk sizes for str datatypes

Chunks for str need to include string length dimension since python-netcdf represents, for example, 1d char array as 2d array

Parameters

xarr (xarray.DataArray) – DataArray to consider
chunksizes (tuple[int]) – Chunk sizes per dimension

Returns

Guarded chunk sizes

Return type

tuple[int]

stems.io.encoding.guard_dtype(xarr, dtype_)[source]¶

Guard dtype encoding for datetime datatypes

Parameters

xarr (xarray.DataArray or np.ndarray) – DataArray to consider
dtype_ (dict[str, np.dtype]) – Datatype information for encoding (e.g., {'dtype': np.float32})

Returns

Datatype information for encoding (e.g., {'dtype': np.float32}), if valid. Otherwise returns empty dict

Return type

dict[str, np.dtype]

stems.io.encoding.netcdf_encoding(data, chunks=None, zlib=True, complevel=4, nodata=None, **encoding_kwds)[source]¶

Return “good” NetCDF encoding information for some data

The returned encoding is the default or “good” known standard for data used in stems. Each default determined in this function is given as a keyword argument to allow overriding, and you can also pass additional encoding items via **encoding. You may pass one override for all data, or overrides for each data variable (as a dict).

For more information, see the NetCDF4 documentation for the createVariable [1].

Parameters

data (xr.DataArray or xr.Dataset) – Define encoding for this data. If xr.Dataset, map function across all xr.DataArray in data.data_vars
dtype (np.dtype, optional) – The data type used for the encoded data. Defaults to the input data type(s), but can be set to facilitate discretization based compression (typically alongside scale_factor and _FillValue)
chunks (None, tuple or dict, optional) – Chunksizes used to encode NetCDF. If given as a tuple, chunks should be given for each dimension. Chunks for dimensions not specified when given as a dict will default to 1. Passing False will not use chunks.
zlib (bool, optional) – Use compression
complevel (int, optional) – Compression level
nodata (int, float, or sequence, optional) – NoDataValue(s). Specify one for each DataArray in data if a xarray.Dataset. Used for _FillValue
encoding_kwds (dict) – Additional encoding data to pass

Returns

Dict mapping band name (e.g., variable name) to relevant encoding information

Return type

dict