Parallel Computing Helpers

The stems project has tries to help facilitate running geospatial time series analysis in parallel using dask and dask.distributed.

Distributed Cluster Setup

The stems.executor module contains a few functions that help create and shutdown distributed.LocalCluster

stems.executor.setup_executor([address, …])

Setup a Dask distributed cluster scheduler client

stems.executor.executor_info(client[, ip, …])

Return a list of strings with info for a scheduler

Command Line Interface Applications

When building command line interface (CLI) applications, you can use some of the Click-based callbacks and program arguments or options.

Program Options

stems.cli.options.opt_nprocs

stems.cli.options.opt_nthreads

stems.cli.options.opt_scheduler

These are decorators (see click.option()) that add Dask parallel processing capabilities to Click based programs.

Click Callbacks

You might also want to use the following callback functions to develop your own Click program arguments or options.

stems.cli.options.cb_executor(ctx, param, value)

Callback for returning a Distributed client

stems.cli.options.close_scheduler

Block Mapping of Functions

The stems.parallel module contains functions that help you run functions on chunks of data.

The functions, stems.parallel.iter_chunks() and stems.parallel.iter_noncore_chunks() assist you in iterating across blocks of data by yielding array dimension slices.

stems.parallel.iter_chunks(dim_sizes[, …])

Return slices that help iterate over dimensions in chunks/blocks

stems.parallel.iter_noncore_chunks(data, …)

Yield data in chunks selected out from non-core dimensions

You can nest these to run an inner function on each pixel within a block on a worker (e.g., chunksize=1 for each dimension), and process many blocks in parallel using the slices (e.g., chunksize=100 in x/y) you calculate using the same stems.parallel.iter_chunks() function.

Sometimes you want to run a function across some arbitrary dimensions and collect the results in a single list. One common example is running the CCDC algorithm on pixels from an image (dims=('band', 'time', 'y', 'x', )) or from a list of samples (dims=('band', 'time', 'sample_id', )) and collecting all the segments estimated into a single dimension array (dims=('segment_id', )).

The function stems.parallel.map_collect_1d() is designed to help with this task:

stems.parallel.map_collect_1d(core_dims[, …])

Decorator that maps a function across pixels and concatenates the result