Note
Go to the end to download the full example code
Using dask distributed for single-machine parallel computing¶
This example shows the simplest usage of the dask distributed backend, on the local computer.
This is useful for prototyping a solution, to later be run on a truly distributed cluster, as the only change to be made is the address of the scheduler.
Another realistic usage scenario: combining dask code with joblib code, for instance using dask for preprocessing data, and scikit-learn for machine learning. In such a setting, it may be interesting to use distributed as a backend scheduler for both dask and joblib, to orchestrate well the computation.
Setup the distributed client¶
Run parallel computation using dask.distributed¶
import time
import joblib
def long_running_function(i):
time.sleep(.1)
return i
The verbose messages below show that the backend is indeed the dask.distributed one
with joblib.parallel_backend('dask'):
joblib.Parallel(verbose=100)(
joblib.delayed(long_running_function)(i)
for i in range(10))
[Parallel(n_jobs=-1)]: Using backend DaskDistributedBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done 1 tasks | elapsed: 0.3s
[Parallel(n_jobs=-1)]: Done 2 tasks | elapsed: 0.3s
[Parallel(n_jobs=-1)]: Done 3 tasks | elapsed: 0.4s
[Parallel(n_jobs=-1)]: Done 4 tasks | elapsed: 0.4s
[Parallel(n_jobs=-1)]: Done 5 tasks | elapsed: 0.5s
[Parallel(n_jobs=-1)]: Done 6 tasks | elapsed: 0.5s
[Parallel(n_jobs=-1)]: Done 7 tasks | elapsed: 0.6s
[Parallel(n_jobs=-1)]: Done 8 out of 10 | elapsed: 0.6s remaining: 0.1s
[Parallel(n_jobs=-1)]: Done 10 out of 10 | elapsed: 0.7s remaining: 0.0s
[Parallel(n_jobs=-1)]: Done 10 out of 10 | elapsed: 0.7s finished
Progress in computation can be followed on the distributed web interface, see https://dask.pydata.org/en/latest/diagnostics-distributed.html
Total running time of the script: ( 0 minutes 0.910 seconds)