stems.io.chunk module

Handle chunks/chunksize related logic

Chunks vs chunksizes:

"Chunks" refers to a collection of chunk sizes organized by dimension

“Chunksizes” refers to a scalar size (an integer) organized by dimension

  • e.g., {'time': 3}

  • chunksizes is used in encoding for NetCDF4 xarray backend

stems.io.chunk.auto_determine_chunks(filename)[source]

Try to guess the best chunksizes for a filename

Parameters

filename (str) – File to read

Returns

Best guess for chunksizes to use for each dimension

Return type

dict

stems.io.chunk.best_chunksizes(chunks, tiebreaker=<built-in function max>)[source]

Decide which chunksize to use for each dimension from variables

Parameters
  • chunks (Mapping[str, Mapping[str, int]]) – Mapping of variable names to variable chunksizes

  • tiebreaker (callable, optional) – Controls what chunksize should be used for a dimension in the event of a tie. For example, if 3 variables had a chunksize of 250 and another 3 had a chunksize of 500, the guess is determined by callable([250, 500]). By default, prefer the larger chunksize (i.e., max())

Returns

Chunksize per dimension

Return type

dict

Examples

>>> chunks = {
... 'blu': {'x': 5, 'y': 5},
... 'grn': {'x': 5, 'y': 10},
... 'red': {'x': 5, 'y': 5},
... 'ordinal': None
}
>>> best_chunksizes(chunks)
{'x': 5, 'y': 5}
stems.io.chunk.chunks_to_chunksizes(data, dims=None)[source]

Convert an object to chunksizes (i.e., used in encoding)

Parameters
  • data (xarray.DataArray, dict, or xarray.Dataset) – Input data containing chunk information

  • dims (Sequence[str], optional) – Optionally, provide the order in which dimension chunksizes should be returned. Useful when asking for chunksizes from not-necessarily-ordered data (dicts and Datasets)

Returns

Chunk sizes for each dimension. Returns an empty tuple if there are no chunks.

Return type

tuple

stems.io.chunk.get_chunksizes(xarr)[source]

Return the chunk sizes used for each dimension in xarr

Parameters

xarr (xr.DataArray or xr.Dataset) – Chunked data

Returns

Dimensions (keys) and chunk sizes (values)

Return type

dict

Raises

TypeError – Raised if input is not a Dataset or DataArray

stems.io.chunk.read_chunks(filename, variables=None)[source]

Return chunks associated with each variable if possible

Parameters
  • filename (str) – Read chunks from this file

  • variables (Sequence) – Subset of variables to retrieve chunking for

Returns

Mapping of variable names to chunks. Chunks are stored mapping dimension name to chunksize (e.g., {'x': 250})

Return type

Mapping[str, Mapping[str, int]]

Raises

ValueError – Raised if no chunks can be determined (unknown file format, etc.)

stems.io.chunk.read_chunks_netcdf4(filename, variables=None)[source]

Return chunks associated with each variable

Parameters
  • filename (str) – Filename of NetCDF file

  • variables (Sequence) – Subset of variables to retrieve chunking for

Returns

Mapping of variable names to chunks. Chunks are stored mapping dimension name to chunksize (e.g., {'x': 250})

Return type

Mapping[str, Mapping[str, int]]

stems.io.chunk.read_chunks_rasterio(riods)[source]

Returns chunks for rasterio dataset formatted for xarray

Parameters

riods (str, pathlib.Path, or rasterio.DatasetReader) – Rasterio dataset or path to dataset

Returns

Chunks as expected by xarray (e.g., {'x': 50, 'y': 50})

Return type

dict