stems.io.chunk module¶
Handle chunks/chunksize related logic
Chunks vs chunksizes:
"Chunks" refers to a collection of chunk sizes organized by dimension
e.g.,
{'time': (3, 3, 3, 1, )}
For
dask.array.Array
andxarray.DataArray
,.chunks
is a tuple
xarray.Dataset
.chunks
is a mapping
“Chunksizes” refers to a scalar size (an integer) organized by dimension
e.g.,
{'time': 3}
chunksizes
is used in encoding for NetCDF4 xarray backend
-
stems.io.chunk.
auto_determine_chunks
(filename)[source]¶ Try to guess the best chunksizes for a filename
-
stems.io.chunk.
best_chunksizes
(chunks, tiebreaker=<built-in function max>)[source]¶ Decide which chunksize to use for each dimension from variables
- Parameters
chunks (Mapping[str, Mapping[str, int]]) – Mapping of variable names to variable chunksizes
tiebreaker (callable, optional) – Controls what chunksize should be used for a dimension in the event of a tie. For example, if 3 variables had a chunksize of 250 and another 3 had a chunksize of 500, the guess is determined by
callable([250, 500])
. By default, prefer the larger chunksize (i.e.,max()
)
- Returns
Chunksize per dimension
- Return type
Examples
>>> chunks = { ... 'blu': {'x': 5, 'y': 5}, ... 'grn': {'x': 5, 'y': 10}, ... 'red': {'x': 5, 'y': 5}, ... 'ordinal': None } >>> best_chunksizes(chunks) {'x': 5, 'y': 5}
-
stems.io.chunk.
chunks_to_chunksizes
(data, dims=None)[source]¶ Convert an object to chunksizes (i.e., used in encoding)
- Parameters
data (xarray.DataArray, dict, or xarray.Dataset) – Input data containing chunk information
dims (Sequence[str], optional) – Optionally, provide the order in which dimension chunksizes should be returned. Useful when asking for chunksizes from not-necessarily-ordered data (dicts and Datasets)
- Returns
Chunk sizes for each dimension. Returns an empty tuple if there are no chunks.
- Return type
-
stems.io.chunk.
read_chunks
(filename, variables=None)[source]¶ Return chunks associated with each variable if possible
- Parameters
filename (str) – Read chunks from this file
variables (Sequence) – Subset of variables to retrieve
chunking
for
- Returns
Mapping of variable names to chunks. Chunks are stored mapping dimension name to chunksize (e.g.,
{'x': 250}
)- Return type
- Raises
ValueError – Raised if no chunks can be determined (unknown file format, etc.)
-
stems.io.chunk.
read_chunks_netcdf4
(filename, variables=None)[source]¶ Return chunks associated with each variable
-
stems.io.chunk.
read_chunks_rasterio
(riods)[source]¶ Returns chunks for rasterio dataset formatted for xarray
- Parameters
riods (str, pathlib.Path, or rasterio.DatasetReader) – Rasterio dataset or path to dataset
- Returns
Chunks as expected by xarray (e.g.,
{'x': 50, 'y': 50}
)- Return type