Persistence

Use case

joblib.dump() and joblib.load() provide a replacement for pickle to work efficiently on arbitrary Python objects containing large data, in particular large numpy arrays.

A simple example

First create a temporary directory:

>>> from tempfile import mkdtemp
>>> savedir = mkdtemp()
>>> import os
>>> filename = os.path.join(savedir, 'test.joblib')

Then create an object to be persisted:

>>> import numpy as np
>>> to_persist = [('a', [1, 2, 3]), ('b', np.arange(10))]

which is saved into filename:

>>> import joblib
>>> joblib.dump(to_persist, filename)  
['...test.joblib']

The object can then be reloaded from the file:

>>> joblib.load(filename)
[('a', [1, 2, 3]), ('b', array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))]

Persistence in file objects

Instead of filenames, joblib.dump() and joblib.load() functions also accept file objects:

>>> with open(filename, 'wb') as fo:  
...    joblib.dump(to_persist, fo)
>>> with open(filename, 'rb') as fo:  
...    joblib.load(fo)
[('a', [1, 2, 3]), ('b', array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))]

Compressed joblib pickles

Setting the compress argument to True in joblib.dump() will allow to save space on disk:

>>> joblib.dump(to_persist, filename + '.compressed', compress=True)  
['...test.joblib.compressed']

If the filename extension corresponds to one of the supported compression methods, the compressor will be used automatically:

>>> joblib.dump(to_persist, filename + '.z')  
['...test.joblib.z']

By default, joblib.dump() uses the zlib compression method as it gives the best tradeoff between speed and disk space. The other supported compression methods are ‘gzip’, ‘bz2’, ‘lzma’ and ‘xz’:

>>> # Dumping in a gzip compressed file using a compress level of 3.
>>> joblib.dump(to_persist, filename + '.gz', compress=('gzip', 3))  
['...test.joblib.gz']
>>> joblib.load(filename + '.gz')
[('a', [1, 2, 3]), ('b', array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))]
>>> joblib.dump(to_persist, filename + '.bz2', compress=('bz2', 3))  
['...test.joblib.bz2']
>>> joblib.load(filename + '.bz2')
[('a', [1, 2, 3]), ('b', array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))]

The compress parameter of the joblib.dump() function also accepts a string corresponding to the name of the compressor used. When using this, the default compression level is used by the compressor:

>>> joblib.dump(to_persist, filename + '.gz', compress='gzip')  
['...test.joblib.gz']

Note

Lzma and Xz compression methods are only available for python versions >= 3.3.

Compressor files provided by the python standard library can also be used to compress pickle, e.g gzip.GzipFile, bz2.BZ2File, lzma.LZMAFile:

>>> # Dumping in a gzip.GzipFile object using a compression level of 3.
>>> import gzip
>>> with gzip.GzipFile(filename + '.gz', 'wb', compresslevel=3) as fo:  
...    joblib.dump(to_persist, fo)
>>> with gzip.GzipFile(filename + '.gz', 'rb') as fo:  
...    joblib.load(fo)
[('a', [1, 2, 3]), ('b', array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))]

If the lz4 package is installed, this compression method is automatically available with the dump function.

>>> joblib.dump(to_persist, filename + '.lz4')  
['...test.joblib.lz4']
>>> joblib.load(filename + '.lz4')
[('a', [1, 2, 3]), ('b', array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))]

Note

LZ4 compression is only available with python major versions >= 3

More details can be found in the joblib.dump() and joblib.load() documentation.

Registering extra compressors

Joblib provides joblib.register_compressor() in order to extend the list of default compressors available. To fit with Joblib internal implementation and features, such as joblib.load() and joblib.Memory, the registered compressor should implement the Python file object interface.

Compatibility across python versions

Compatibility of joblib pickles across python versions is not fully supported. Note that, for a very restricted set of objects, this may appear to work when saving a pickle with python 2 and loading it with python 3 but relying on it is strongly discouraged.

If you are switching between python versions, you will need to save a different joblib pickle for each python version.

Here are a few examples or exceptions:

  • Saving joblib pickle with python 2, trying to load it with python 3:

    Traceback (most recent call last):
      File "/home/lesteve/dev/joblib/joblib/numpy_pickle.py", line 453, in load
        obj = unpickler.load()
      File "/home/lesteve/miniconda3/lib/python3.4/pickle.py", line 1038, in load
        dispatch[key[0]](self)
      File "/home/lesteve/miniconda3/lib/python3.4/pickle.py", line 1176, in load_binstring
        self.append(self._decode_string(data))
      File "/home/lesteve/miniconda3/lib/python3.4/pickle.py", line 1158, in _decode_string
        return value.decode(self.encoding, self.errors)
    UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 1024: ordinal not in range(128)
    
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/home/lesteve/dev/joblib/joblib/numpy_pickle.py", line 462, in load
        raise new_exc
      ValueError: You may be trying to read with python 3 a joblib pickle generated with python 2. This is not feature supported by joblib.
    
  • Saving joblib pickle with python 3, trying to load it with python 2:

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "joblib/numpy_pickle.py", line 453, in load
        obj = unpickler.load()
      File "/home/lesteve/miniconda3/envs/py27/lib/python2.7/pickle.py", line 858, in load
        dispatch[key](self)
      File "/home/lesteve/miniconda3/envs/py27/lib/python2.7/pickle.py", line 886, in load_proto
        raise ValueError, "unsupported pickle protocol: %d" % proto
    ValueError: unsupported pickle protocol: 3