stems.io.chunk module¶

Handle chunks/chunksize related logic

Chunks vs chunksizes:

"Chunks" refers to a collection of chunk sizes organized by dimension

e.g., {'time': (3, 3, 3, 1, )}

For dask.array.Array and xarray.DataArray, .chunks is a tuple

xarray.Dataset .chunks is a mapping

“Chunksizes” refers to a scalar size (an integer) organized by dimension

e.g., {'time': 3}

chunksizes is used in encoding for NetCDF4 xarray backend

stems.io.chunk.auto_determine_chunks(filename)[source]¶

Try to guess the best chunksizes for a filename

Parameters: filename (str) – File to read
Returns: Best guess for chunksizes to use for each dimension
Return type: dict

stems.io.chunk.best_chunksizes(chunks, tiebreaker=<built-in function max>)[source]¶

Decide which chunksize to use for each dimension from variables

Parameters

chunks (Mapping[str, Mapping[str, int]]) – Mapping of variable names to variable chunksizes
tiebreaker (callable, optional) – Controls what chunksize should be used for a dimension in the event of a tie. For example, if 3 variables had a chunksize of 250 and another 3 had a chunksize of 500, the guess is determined by callable([250, 500]). By default, prefer the larger chunksize (i.e., max())

Returns

Chunksize per dimension

Return type

dict

Examples

>>> chunks = {
... 'blu': {'x': 5, 'y': 5},
... 'grn': {'x': 5, 'y': 10},
... 'red': {'x': 5, 'y': 5},
... 'ordinal': None
}
>>> best_chunksizes(chunks)
{'x': 5, 'y': 5}

stems.io.chunk.chunks_to_chunksizes(data, dims=None)[source]¶

Convert an object to chunksizes (i.e., used in encoding)

Parameters

data (xarray.DataArray, dict, or xarray.Dataset) – Input data containing chunk information
dims (Sequence[str], optional) – Optionally, provide the order in which dimension chunksizes should be returned. Useful when asking for chunksizes from not-necessarily-ordered data (dicts and Datasets)

Returns

Chunk sizes for each dimension. Returns an empty tuple if there are no chunks.

Return type

tuple

stems.io.chunk.get_chunksizes(xarr)[source]¶

Return the chunk sizes used for each dimension in xarr

Parameters: xarr (xr.DataArray or xr.Dataset) – Chunked data
Returns: Dimensions (keys) and chunk sizes (values)
Return type: dict
Raises: TypeError – Raised if input is not a Dataset or DataArray

stems.io.chunk.read_chunks(filename, variables=None)[source]¶

Return chunks associated with each variable if possible

Parameters

filename (str) – Read chunks from this file
variables (Sequence) – Subset of variables to retrieve chunking for

Returns

Mapping of variable names to chunks. Chunks are stored mapping dimension name to chunksize (e.g., {'x': 250})

Return type

Mapping[str, Mapping[str, int]]

Raises

ValueError – Raised if no chunks can be determined (unknown file format, etc.)

stems.io.chunk.read_chunks_netcdf4(filename, variables=None)[source]¶

Return chunks associated with each variable

Parameters

filename (str) – Filename of NetCDF file
variables (Sequence) – Subset of variables to retrieve chunking for

Returns

Mapping of variable names to chunks. Chunks are stored mapping dimension name to chunksize (e.g., {'x': 250})

Return type

Mapping[str, Mapping[str, int]]

stems.io.chunk.read_chunks_rasterio(riods)[source]¶

Returns chunks for rasterio dataset formatted for xarray

Parameters: riods (str, pathlib.Path, or rasterio.DatasetReader) – Rasterio dataset or path to dataset
Returns: Chunks as expected by xarray (e.g., {'x': 50, 'y': 50})
Return type: dict