.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/nested_parallel_memory.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_nested_parallel_memory.py: ================================================== Checkpoint using joblib.Memory and joblib.Parallel ================================================== This example illustrates how to cache intermediate computing results using :class:`joblib.Memory` within :class:`joblib.Parallel`. .. GENERATED FROM PYTHON SOURCE LINES 12-17 Embed caching within parallel processing ############################################################################## It is possible to cache a computationally expensive function executed during a parallel process. ``costly_compute`` emulates such time consuming function. .. GENERATED FROM PYTHON SOURCE LINES 17-32 .. code-block:: Python import time def costly_compute(data, column): """Emulate a costly function by sleeping and returning a column.""" time.sleep(2) return data[column] def data_processing_mean(data, column): """Compute the mean of a column.""" return costly_compute(data, column).mean() .. GENERATED FROM PYTHON SOURCE LINES 33-36 Create some data. The random seed is fixed to generate deterministic data across Python session. Note that this is not necessary for this specific example since the memory cache is cleared at the end of the session. .. GENERATED FROM PYTHON SOURCE LINES 36-41 .. code-block:: Python import numpy as np rng = np.random.RandomState(42) data = rng.randn(int(1e4), 4) .. GENERATED FROM PYTHON SOURCE LINES 42-44 It is first possible to make the processing without caching or parallel processing. .. GENERATED FROM PYTHON SOURCE LINES 44-53 .. code-block:: Python start = time.time() results = [data_processing_mean(data, col) for col in range(data.shape[1])] stop = time.time() print('\nSequential processing') print('Elapsed time for the entire processing: {:.2f} s' .format(stop - start)) .. rst-class:: sphx-glr-script-out .. code-block:: none Sequential processing Elapsed time for the entire processing: 8.00 s .. GENERATED FROM PYTHON SOURCE LINES 54-57 ``costly_compute`` is expensive to compute and it is used as an intermediate step in ``data_processing_mean``. Therefore, it is interesting to store the intermediate results from ``costly_compute`` using :class:`joblib.Memory`. .. GENERATED FROM PYTHON SOURCE LINES 57-65 .. code-block:: Python from joblib import Memory location = './cachedir' memory = Memory(location, verbose=0) costly_compute_cached = memory.cache(costly_compute) .. GENERATED FROM PYTHON SOURCE LINES 66-68 Now, we define ``data_processing_mean_using_cache`` which benefits from the cache by calling ``costly_compute_cached`` .. GENERATED FROM PYTHON SOURCE LINES 68-74 .. code-block:: Python def data_processing_mean_using_cache(data, column): """Compute the mean of a column.""" return costly_compute_cached(data, column).mean() .. GENERATED FROM PYTHON SOURCE LINES 75-77 Then, we execute the same processing in parallel and caching the intermediate results. .. GENERATED FROM PYTHON SOURCE LINES 77-90 .. code-block:: Python from joblib import Parallel, delayed start = time.time() results = Parallel(n_jobs=2)( delayed(data_processing_mean_using_cache)(data, col) for col in range(data.shape[1])) stop = time.time() print('\nFirst round - caching the data') print('Elapsed time for the entire processing: {:.2f} s' .format(stop - start)) .. rst-class:: sphx-glr-script-out .. code-block:: none First round - caching the data Elapsed time for the entire processing: 4.53 s .. GENERATED FROM PYTHON SOURCE LINES 91-95 By using 2 workers, the parallel processing gives a x2 speed-up compared to the sequential case. By executing again the same process, the intermediate results obtained by calling ``costly_compute_cached`` will be loaded from the cache instead of executing the function. .. GENERATED FROM PYTHON SOURCE LINES 95-106 .. code-block:: Python start = time.time() results = Parallel(n_jobs=2)( delayed(data_processing_mean_using_cache)(data, col) for col in range(data.shape[1])) stop = time.time() print('\nSecond round - reloading from the cache') print('Elapsed time for the entire processing: {:.2f} s' .format(stop - start)) .. rst-class:: sphx-glr-script-out .. code-block:: none Second round - reloading from the cache Elapsed time for the entire processing: 0.01 s .. GENERATED FROM PYTHON SOURCE LINES 107-114 Reuse intermediate checkpoints ############################################################################## Having cached the intermediate results of the ``costly_compute_cached`` function, they are reusable by calling the function. We define a new processing which will take the maximum of the array returned by ``costly_compute_cached`` instead of previously the mean. .. GENERATED FROM PYTHON SOURCE LINES 114-131 .. code-block:: Python def data_processing_max_using_cache(data, column): """Compute the max of a column.""" return costly_compute_cached(data, column).max() start = time.time() results = Parallel(n_jobs=2)( delayed(data_processing_max_using_cache)(data, col) for col in range(data.shape[1])) stop = time.time() print('\nReusing intermediate checkpoints') print('Elapsed time for the entire processing: {:.2f} s' .format(stop - start)) .. rst-class:: sphx-glr-script-out .. code-block:: none Reusing intermediate checkpoints Elapsed time for the entire processing: 0.01 s .. GENERATED FROM PYTHON SOURCE LINES 132-135 The processing time only corresponds to the execution of the ``max`` function. The internal call to ``costly_compute_cached`` is reloading the results from the cache. .. GENERATED FROM PYTHON SOURCE LINES 137-139 Clean-up the cache folder ############################################################################## .. GENERATED FROM PYTHON SOURCE LINES 139-141 .. code-block:: Python memory.clear(warn=False) .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 12.558 seconds) .. _sphx_glr_download_auto_examples_nested_parallel_memory.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: nested_parallel_memory.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: nested_parallel_memory.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_