NumPy memmap in joblib.Parallel#

This example illustrates some features enabled by using a memory map (numpy.memmap) within joblib.Parallel. First, we show that dumping a huge data array ahead of passing it to joblib.Parallel speeds up computation. Then, we show the possibility to provide write access to original data.

Speed up processing of a large data array#

We create a large data array for which the average is computed for several slices.

import numpy as np

data = np.random.random((int(1e7),))
window_size = int(5e5)
slices = [
    slice(start, start + window_size)
    for start in range(0, data.size - window_size, int(1e5))
]

The slow_mean function introduces a time.sleep() call to simulate a more expensive computation cost for which parallel computing is beneficial. Parallel may not be beneficial for very fast operation, due to extra overhead (workers creations, communication, etc.).

import time


def slow_mean(data, sl):
    """Simulate a time consuming processing."""
    time.sleep(0.01)
    return data[sl].mean()

First, we will evaluate the sequential computing on our problem.

tic = time.time()
results = [slow_mean(data, sl) for sl in slices]
toc = time.time()
print(
    "\nElapsed time computing the average of couple of slices {:.2f} s".format(
        toc - tic
    )
)

Elapsed time computing the average of couple of slices 0.98 s

joblib.Parallel is used to compute in parallel the average of all slices using 2 workers.

from joblib import Parallel, delayed

tic = time.time()
results = Parallel(n_jobs=2)(delayed(slow_mean)(data, sl) for sl in slices)
toc = time.time()
print(
    "\nElapsed time computing the average of couple of slices {:.2f} s".format(
        toc - tic
    )
)

Elapsed time computing the average of couple of slices 0.66 s

Parallel processing is already faster than the sequential processing. It is also possible to remove a bit of overhead by dumping the data array to a memmap and pass the memmap to joblib.Parallel.

import os

from joblib import dump, load

folder = "./joblib_memmap"
try:
    os.mkdir(folder)
except FileExistsError:
    pass

data_filename_memmap = os.path.join(folder, "data_memmap")
dump(data, data_filename_memmap)
data = load(data_filename_memmap, mmap_mode="r")

tic = time.time()
results = Parallel(n_jobs=2)(delayed(slow_mean)(data, sl) for sl in slices)
toc = time.time()
print(
    "\nElapsed time computing the average of couple of slices {:.2f} s\n".format(
        toc - tic
    )
)

Elapsed time computing the average of couple of slices 0.52 s

Therefore, dumping large data array ahead of calling joblib.Parallel can speed up the processing by removing some overhead.

Writable memmap for shared memory `joblib.Parallel`#

slow_mean_write_output will compute the mean for some given slices as in the previous example. However, the resulting mean will be directly written on the output array.

def slow_mean_write_output(data, sl, output, idx):
    """Simulate a time consuming processing."""
    time.sleep(0.005)
    res_ = data[sl].mean()
    print("[Worker %d] Mean for slice %d is %f" % (os.getpid(), idx, res_))
    output[idx] = res_

Prepare the folder where the memmap will be dumped.

output_filename_memmap = os.path.join(folder, "output_memmap")

Pre-allocate a writable shared memory map as a container for the results of the parallel computation.

output = np.memmap(
    output_filename_memmap, dtype=data.dtype, shape=len(slices), mode="w+"
)

data is replaced by its memory mapped version. Note that the buffer has already been dumped in the previous section.

data = load(data_filename_memmap, mmap_mode="r")

Fork the worker processes to perform computation concurrently

Parallel(n_jobs=2)(
    delayed(slow_mean_write_output)(data, sl, output, idx)
    for idx, sl in enumerate(slices)
)

[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]

Compare the results from the output buffer with the expected results

print("\nExpected means computed in the parent process:\n {}".format(np.array(results)))
print("\nActual means computed by the worker processes:\n {}".format(output))

Expected means computed in the parent process:
 [0.49925642 0.49965382 0.49957894 0.49974917 0.49959612 0.49977957
50005296 0.50034048 0.50025311 0.49975596 0.49980773 0.49957412
49954919 0.49984438 0.50030185 0.50001727 0.4999824  0.49986012
49976142 0.49982924 0.50017835 0.50001951 0.50006107 0.50026438
50009822 0.49999983 0.50002907 0.4999548  0.50013093 0.50042799
50010194 0.50034742 0.5005871  0.50073321 0.50068576 0.50082524
50090392 0.50043561 0.49989374 0.50001712 0.50002233 0.5003206
500452   0.50037076 0.49984218 0.49994431 0.49974251 0.49960869
50008934 0.50050593 0.50036316 0.4998497  0.4999725  0.49950465
49922001 0.49917603 0.49945979 0.49938025 0.49927956 0.49962293
49998899 0.49998582 0.50029946 0.50044796 0.50039655 0.50009788
5001613  0.49976874 0.49971321 0.49977236 0.49969951 0.49924363
49932935 0.49929335 0.49879464 0.49886569 0.4993586  0.49941511
49975193 0.50003907 0.50034222 0.49996841 0.5001442  0.50001593
50019617 0.49994495 0.50044793 0.50040624 0.50019778 0.49981505
49966658 0.49955882 0.49971944 0.49997202 0.50032397]

Actual means computed by the worker processes:
 [0.49925642 0.49965382 0.49957894 0.49974917 0.49959612 0.49977957
50005296 0.50034048 0.50025311 0.49975596 0.49980773 0.49957412
49954919 0.49984438 0.50030185 0.50001727 0.4999824  0.49986012
49976142 0.49982924 0.50017835 0.50001951 0.50006107 0.50026438
50009822 0.49999983 0.50002907 0.4999548  0.50013093 0.50042799
50010194 0.50034742 0.5005871  0.50073321 0.50068576 0.50082524
50090392 0.50043561 0.49989374 0.50001712 0.50002233 0.5003206
500452   0.50037076 0.49984218 0.49994431 0.49974251 0.49960869
50008934 0.50050593 0.50036316 0.4998497  0.4999725  0.49950465
49922001 0.49917603 0.49945979 0.49938025 0.49927956 0.49962293
49998899 0.49998582 0.50029946 0.50044796 0.50039655 0.50009788
5001613  0.49976874 0.49971321 0.49977236 0.49969951 0.49924363
49932935 0.49929335 0.49879464 0.49886569 0.4993586  0.49941511
49975193 0.50003907 0.50034222 0.49996841 0.5001442  0.50001593
50019617 0.49994495 0.50044793 0.50040624 0.50019778 0.49981505
49966658 0.49955882 0.49971944 0.49997202 0.50032397]

Clean-up the memmap#

Remove the different memmap that we created. It might fail in Windows due to file permissions.

import shutil

try:
    shutil.rmtree(folder)
except:  # noqa
    print("Could not clean-up automatically.")

Total running time of the script: (0 minutes 2.513 seconds)

Gallery generated by Sphinx-Gallery

NumPy memmap in joblib.Parallel#

Speed up processing of a large data array#

Writable memmap for shared memory joblib.Parallel#

Clean-up the memmap#

Writable memmap for shared memory `joblib.Parallel`#