Using Dask for single-machine parallel computing¶
This example shows the simplest usage of the Dask backend on your local machine.
This is useful for prototyping a solution, to later be run on a truly distributed Dask cluster, as the only change needed is the cluster class.
Another realistic usage scenario: combining dask code with joblib code, for instance using dask for preprocessing data, and scikit-learn for machine learning. In such a setting, it may be interesting to use distributed as a backend scheduler for both dask and joblib, to orchestrate the computation.
Setup the distributed client¶
from dask.distributed import Client, LocalCluster # replace with whichever cluster class you're using # https://docs.dask.org/en/stable/deploying.html#distributed-computing cluster = LocalCluster() # connect client to your cluster client = Client(cluster) # Monitor your computation with the Dask dashboard print(client.dashboard_link)
Run parallel computation using dask.distributed¶
import time import joblib def long_running_function(i): time.sleep(0.1) return i
The verbose messages below show that the backend is indeed the dask.distributed one
[Parallel(n_jobs=-1)]: Using backend DaskDistributedBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 1 tasks | elapsed: 0.2s [Parallel(n_jobs=-1)]: Batch computation too fast (0.15952348709106445s.) Setting batch_size=2. [Parallel(n_jobs=-1)]: Done 2 tasks | elapsed: 0.2s [Parallel(n_jobs=-1)]: Done 3 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 4 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 6 tasks | elapsed: 0.5s [Parallel(n_jobs=-1)]: Done 8 out of 10 | elapsed: 0.5s remaining: 0.1s [Parallel(n_jobs=-1)]: Done 10 out of 10 | elapsed: 0.6s finished
Total running time of the script: (0 minutes 1.886 seconds)