API¶
Rechunk Function¶
The main function exposed by rechunker is rechunker.rechunk()
.
- rechunker.rechunk(source, target_chunks, max_mem, target_store, target_options=None, temp_store=None, temp_options=None, executor: str | CopySpecExecutor = 'dask') Rechunked ¶
Rechunk a Zarr Array or Group, a Dask Array, or an Xarray Dataset
- Parameters:
- sourcezarr.Array, zarr.Group, dask.array.Array, or xarray.Dataset
Named dimensions in the Zarr arrays will be parsed according to the Xarray Zarr Encoding Specification.
- target_chunkstuple, dict, or None
The desired chunks of the array after rechunking. The structure depends on
source
.For a single array source,
target_chunks
can be either a tuple (e.g.(20, 5, 3)
) or a dictionary (e.g.{'time': 20, 'lat': 5, 'lon': 3}
). Dictionary syntax requires the dimension names be present in the Zarr Array attributes (see Xarray Zarr Encoding Specification.) A value ofNone
means that the array will be copied with no change to its chunk structure.For a group of arrays, a dict is required. The keys correspond to array names. The values are
target_chunks
arguments for the array. For example,{'foo': (20, 10), 'bar': {'x': 3, 'y': 5}, 'baz': None}
. All arrays you want to rechunk must be explicitly named. Arrays that are not present in thetarget_chunks
dict will be ignored.
- max_memstr or int
The amount of memory (in bytes) that workers are allowed to use. A string (e.g.
100MB
) can also be used.- target_storestr, MutableMapping, or zarr.Store object
The location in which to store the final, rechunked result. Will be passed directly to
zarr.creation.create()
- target_options: Dict, optional
Additional keyword arguments used to control array storage. If the source is
xarray.Dataset
, then these options will be used to encode variables in the same manner as theencoding
parameter inxarray.Dataset.to_zarr()
. Otherwise, these options will be passed tozarr.creation.create()
. The structure depends onsource
.For a single array source, this should be a single dict such as
{'compressor': zarr.Blosc(), 'order': 'F'}
.For a group of arrays, a nested dict is required with values like the above keyed by array name. For example,
{'foo': {'compressor': zarr.Blosc(), 'order': 'F'}, 'bar': {'compressor': None}}
.
- temp_storestr, MutableMapping, or zarr.Store object, optional
Location of temporary store for intermediate data. Can be deleted once rechunking is complete.
- temp_options: Dict, optional
Options with same semantics as
target_options
fortemp_store
rather thantarget_store
. Defaults totarget_options
and has no effect when source is of type xarray.Dataset.- executor: str or rechunker.types.Executor
Implementation of the execution engine for copying between zarr arrays. Supplying a custom Executor is currently even more experimental than the rest of Rechunker: we expect the interface to evolve as we add more executors and make no guarantees of backwards compatibility. The currently implemented executors are
dask
beam
prefect
python
pywren
- Returns:
- rechunked
Rechunked
object
- rechunked
The Rechunked Object¶
rechunk
returns a Rechunked
object.
- class rechunker.Rechunked(executor, plan, source, intermediate, target)¶
A delayed rechunked result.
This represents the rechunking plan, and when executed will perform the rechunking and return the rechunked array.
Examples
>>> source = zarr.ones((4, 4), chunks=(2, 2), store="source.zarr") >>> intermediate = "intermediate.zarr" >>> target = "target.zarr" >>> rechunked = rechunk(source, target_chunks=(4, 1), target_store=target, ... max_mem=256000, ... temp_store=intermediate) >>> rechunked <Rechunked> * Source : <zarr.core.Array (4, 4) float64> * Intermediate: dask.array<from-zarr, ... > * Target : <zarr.core.Array (4, 4) float64> >>> rechunked.execute() <zarr.core.Array (4, 4) float64>
- Attributes:
plan
Returns the executor-specific scheduling plan.
Methods
execute
(**kwargs)Execute the rechunking.
Note
You must call execute()
on the Rechunked
object in order to actually
perform the rechunking operation.
Warning
You must manually delete the intermediate store when execute
is finished.
Executors¶
Rechunking plans can be executed on a variety of backends. The following table lists the current options.
|
An execution engine based on Apache Beam. |
- class rechunker.executors.beam.BeamExecutor(*args, **kwds)¶
An execution engine based on Apache Beam.
Supports copying between any arrays that implement
__getitem__
and__setitem__
for tuples ofslice
objects. Array must also be serializable by Beam (i.e., with pickle).Execution plans for BeamExecutor are beam.PTransform objects.
Methods
execute_plan
(plan, **kwargs)Execute a plan.
prepare_plan
(specs)Convert copy specifications into a plan.