stems.io.encoding module

stems.io.encoding.encoding_chunksizes(xarr, chunks=None)[source]

Find/resolve chunksize for a DataArray

Parameters
Returns

Chunksizes per dimension

Return type

tuple[int]

stems.io.encoding.encoding_dtype(xarr)[source]

Get dtype encoding info

Parameters

xarr (xarray.DataArray or np.ndarray) – DataArray to consider

Returns

Datatype information for encoding (e.g., {'dtype': np.float32})

Return type

dict[str, np.dtype]

stems.io.encoding.encoding_name(xarr)[source]

Return the name of the variable to provide encoding for

Either returns the name of the DataArray, or the name that XArray will assign it when writing to disk (xarray.backends.api.DATAARRAY_VARIABLE).

Parameters

xarr (xarray.DataArray) – Provide the name of this DataArray used for encoding

Returns

Encoding variable name

Return type

str

stems.io.encoding.guard_chunksizes(xarr, chunksizes)[source]

Guard chunksize to be <= dimension sizes

Parameters
Returns

Guarded chunksizes

Return type

tuple[int]

stems.io.encoding.guard_chunksizes_str(xarr, chunksizes)[source]

Guard chunk sizes for str datatypes

Chunks for str need to include string length dimension since python-netcdf represents, for example, 1d char array as 2d array

Parameters
Returns

Guarded chunk sizes

Return type

tuple[int]

stems.io.encoding.guard_dtype(xarr, dtype_)[source]

Guard dtype encoding for datetime datatypes

Parameters
  • xarr (xarray.DataArray or np.ndarray) – DataArray to consider

  • dtype_ (dict[str, np.dtype]) – Datatype information for encoding (e.g., {'dtype': np.float32})

Returns

Datatype information for encoding (e.g., {'dtype': np.float32}), if valid. Otherwise returns empty dict

Return type

dict[str, np.dtype]

stems.io.encoding.netcdf_encoding(data, chunks=None, zlib=True, complevel=4, nodata=None, **encoding_kwds)[source]

Return “good” NetCDF encoding information for some data

The returned encoding is the default or “good” known standard for data used in stems. Each default determined in this function is given as a keyword argument to allow overriding, and you can also pass additional encoding items via **encoding. You may pass one override for all data, or overrides for each data variable (as a dict).

For more information, see the NetCDF4 documentation for the createVariable [1].

Parameters
  • data (xr.DataArray or xr.Dataset) – Define encoding for this data. If xr.Dataset, map function across all xr.DataArray in data.data_vars

  • dtype (np.dtype, optional) – The data type used for the encoded data. Defaults to the input data type(s), but can be set to facilitate discretization based compression (typically alongside scale_factor and _FillValue)

  • chunks (None, tuple or dict, optional) – Chunksizes used to encode NetCDF. If given as a tuple, chunks should be given for each dimension. Chunks for dimensions not specified when given as a dict will default to 1. Passing False will not use chunks.

  • zlib (bool, optional) – Use compression

  • complevel (int, optional) – Compression level

  • nodata (int, float, or sequence, optional) – NoDataValue(s). Specify one for each DataArray in data if a xarray.Dataset. Used for _FillValue

  • encoding_kwds (dict) – Additional encoding data to pass

Returns

Dict mapping band name (e.g., variable name) to relevant encoding information

Return type

dict

See also

xarray.Dataset.to_netcdf()

Encoding information designed to be passed to xarray.Dataset.to_netcdf().

References

1

http://unidata.github.io/netcdf4-python/#netCDF4.Dataset.createVariable