Why joblib: project goals#

Benefits of pipelines#

Pipeline processing systems can provide a set of useful features:

Data-flow programming for performance#

  • On-demand computing: in pipeline systems such as labView or VTK, calculations are performed as needed by the outputs and only when inputs change.

  • Transparent parallelization: a pipeline topology can be inspected to deduce which operations can be run in parallel (it is equivalent to purely functional programming).

Provenance tracking to understand the code#

  • Tracking of data and computations: This enables the reproducibility of a computational experiment.

  • Inspecting data flow: Inspecting intermediate results helps debugging and understanding.

Joblib’s approach#

Functions are the simplest abstraction used by everyone. Pipeline jobs (or tasks) in Joblib are made of decorated functions.

Tracking of parameters in a meaningful way requires specification of data model. Joblib gives up on that and uses hashing for performance and robustness.

Design choices#

  • No dependencies other than Python

  • Robust, well-tested code, at the cost of functionality

  • Fast and suitable for scientific computing on big dataset without changing the original code

  • Only local imports: embed joblib in your code by copying it