PyPlant is a simple coroutine-based framework for writing data processing pipelines. PyPlant's goal is to simplify caching of intermediate results in the pipeline and avoid re-running expensive early stages of the pipeline, when only the later stages have changed.
PyPlant is a simple coroutine-based framework for writing data processing pipelines.
Given a set of Python functions that consume and produce data, it automatically runs them in a correct order and caches intermediate results. When the pipeline is executed again, only the necessary parts are re-run.
Importantly, PyPlant was designed with the following design consideration in mind:
Simple: Quick to learn, no custom language and workflow design programs. Start prototyping right away.
DRY: Function code is metadata. No need to write execution graphs or external metadata. It just works (tm).
Automatic: No need to manually re-run outdated parts.
Large data: Handle data that doesn't fit into memory. Persist between runs.
PyPlant can be installed from PyPI: pip install pyplant
For documentation, see README.md.