Using Dask for single-machine parallel computing

This example shows the simplest usage of the Dask backend on your local machine.

This is useful for prototyping a solution, to later be run on a truly distributed Dask cluster, as the only change needed is the cluster class.

Another realistic usage scenario: combining dask code with joblib code, for instance using dask for preprocessing data, and scikit-learn for machine learning. In such a setting, it may be interesting to use distributed as a backend scheduler for both dask and joblib, to orchestrate the computation.

Setup the distributed client

from dask.distributed import Client, LocalCluster

# replace with whichever cluster class you're using
cluster = LocalCluster()
# connect client to your cluster
client = Client(cluster)

# Monitor your computation with the Dask dashboard

Run parallel computation using dask.distributed

import time
import joblib

def long_running_function(i):
    return i

The verbose messages below show that the backend is indeed the dask.distributed one

with joblib.parallel_config(backend="dask"):
        joblib.delayed(long_running_function)(i) for i in range(10)
[Parallel(n_jobs=-1)]: Using backend DaskDistributedBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done   1 tasks      | elapsed:    0.2s
[Parallel(n_jobs=-1)]: Batch computation too fast (0.15892434120178223s.) Setting batch_size=2.
[Parallel(n_jobs=-1)]: Done   2 tasks      | elapsed:    0.2s
[Parallel(n_jobs=-1)]: Done   3 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done   4 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done   6 tasks      | elapsed:    0.5s
[Parallel(n_jobs=-1)]: Done   8 out of  10 | elapsed:    0.5s remaining:    0.1s
[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:    0.6s finished

Total running time of the script: (0 minutes 1.952 seconds)

Gallery generated by Sphinx-Gallery