Coverage for stems/io/chunk.py : 82%

Hot-keys on this page
r m x p toggle line displays
j k next/prev highlighted chunk
0 (zero) top of page
1 (one) first highlighted chunk
""" Handle chunks/chunksize related logic
Chunks vs chunksizes::
"Chunks" refers to a collection of chunk sizes organized by dimension
* e.g., ``{'time': (3, 3, 3, 1, )}`` * For :py:class:`dask.array.Array` and :py:class:`xarray.DataArray`, ``.chunks`` is a tuple * :py:class:`xarray.Dataset` ``.chunks`` is a mapping
"Chunksizes" refers to a scalar size (an integer) organized by dimension
* e.g., ``{'time': 3}`` * ``chunksizes`` is used in encoding for NetCDF4 xarray backend
"""
# ---------------------------------------------------------------------------- # Read chunks from files """ Return chunks associated with each variable if possible
Parameters ---------- filename : str Read chunks from this file variables : Sequence Subset of variables to retrieve ``chunking`` for
Returns ------- Mapping[str, Mapping[str, int]] Mapping of variable names to chunks. Chunks are stored mapping dimension name to chunksize (e.g., ``{'x': 250}``)
Raises ------ ValueError Raised if no chunks can be determined (unknown file format, etc.) """ read_funcs = (read_chunks_netcdf4, ) for func in read_funcs: try: var_chunks = func(filename, variables=variables) except Exception as e: logger.debug(f'Could not determine chunks for "{filename}" ' f'using "{func.__name__}"', exc_info=True) else: return var_chunks raise ValueError(f'Could not determine chunks for "{filename}"')
""" Return chunks associated with each variable
Parameters ---------- filename : str Filename of NetCDF file variables : Sequence Subset of variables to retrieve `chunking` for
Returns ------- Mapping[str, Mapping[str, int]] Mapping of variable names to chunks. Chunks are stored mapping dimension name to chunksize (e.g., ``{'x': 250}``) """ # Keep this import inside incase user doesn't have library # (e.g., with a minimal install of xarray)
'saved chunksizes') # Store info on each chunk: what vars use, and how many (_dim, _chunk) for _dim, _chunk in zip(dims, chunking) if not _dim.startswith('string') )) else:
""" Returns chunks for rasterio dataset formatted for xarray
Parameters ---------- riods : str, pathlib.Path, or rasterio.DatasetReader Rasterio dataset or path to dataset
Returns ------- dict Chunks as expected by xarray (e.g., ``{'x': 50, 'y': 50}``) """ # Keep this import inside incase user doesn't have library # (e.g., with a minimal install of xarray) # Open it for ourselves
warnings.warn('Block shapes inconsistent across bands. ' 'Using block shapes from first band')
# ---------------------------------------------------------------------------- # Chunk heuristics """Decide which chunksize to use for each dimension from variables
Parameters ---------- chunks : Mapping[str, Mapping[str, int]] Mapping of variable names to variable chunksizes tiebreaker : callable, optional Controls what chunksize should be used for a dimension in the event of a tie. For example, if 3 variables had a chunksize of 250 and another 3 had a chunksize of 500, the guess is determined by ``callable([250, 500])``. By default, prefer the larger chunksize (i.e., :py:func:`max`)
Returns ------- dict Chunksize per dimension
Examples --------
>>> chunks = { ... 'blu': {'x': 5, 'y': 5}, ... 'grn': {'x': 5, 'y': 10}, ... 'red': {'x': 5, 'y': 5}, ... 'ordinal': None } >>> best_chunksizes(chunks) {'x': 5, 'y': 5}
""" # Collect all chunksizes as {dim: [chunksizes, ...]} # Guard if chunks/chunksizes
# Use most frequently used chunksize
# If multiple, prefer biggest value (by default) f'`{tiebreaker}`') else:
""" Try to guess the best chunksizes for a filename
Parameters ---------- filename : str File to read
Returns ------- dict Best guess for chunksizes to use for each dimension """ try: var_chunks = read_chunks(str(filename)) except ValueError: logger.debug('"auto" chunk determination failed') chunks = None else: chunks = best_chunksizes(var_chunks)
return chunks
# ---------------------------------------------------------------------------- # Chunk format handling def get_chunksizes(xarr): """ Return the chunk sizes used for each dimension in `xarr`
Parameters ---------- xarr : xr.DataArray or xr.Dataset Chunked data
Returns ------- dict Dimensions (keys) and chunk sizes (values)
Raises ------ TypeError Raised if input is not a Dataset or DataArray """ f'not "{type(xarr)}"')
def _get_chunksizes_dataarray(xarr): (dim, xarr.chunks[i][0]) for i, dim in enumerate(xarr.dims) ))
def _get_chunksizes_dataset(xarr): return OrderedDict(( (dim, chunks[0]) for dim, chunks in xarr.chunks.items() ))
""" Convert an object to chunksizes (i.e., used in encoding)
Parameters ---------- data : xarray.DataArray, dict, or xarray.Dataset Input data containing chunk information dims : Sequence[str], optional Optionally, provide the order in which dimension chunksizes should be returned. Useful when asking for chunksizes from not-necessarily-ordered data (dicts and Datasets)
Returns ------- tuple Chunk sizes for each dimension. Returns an empty tuple if there are no chunks. """
for d in dims_)
dim_idx = [data.dims.index(d) for d in dims] else:
|