PyPlant: A Python Framework for Cached Function Pipelines

DOI

PyPlant is a simple coroutine-based framework for writing data processing pipelines. PyPlant's goal is to simplify caching of intermediate results in the pipeline and avoid re-running expensive early stages of the pipeline, when only the later stages have changed.

PyPlant is a simple coroutine-based framework for writing data processing pipelines. Given a set of Python functions that consume and produce data, it automatically runs them in a correct order and caches intermediate results. When the pipeline is executed again, only the necessary parts are re-run.

Importantly, PyPlant was designed with the following design consideration in mind:

Simple: Quick to learn, no custom language and workflow design programs. Start prototyping right away. DRY: Function code is metadata. No need to write execution graphs or external metadata. It just works (tm). Automatic: No need to manually re-run outdated parts. Large data: Handle data that doesn't fit into memory. Persist between runs.

PyPlant can be installed from PyPI: pip install pyplant For documentation, see README.md.

Identifier
DOI https://doi.org/10.18419/darus-2249
Metadata Access https://darus.uni-stuttgart.de/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.18419/darus-2249
Provenance
Creator Tkachev, Gleb ORCID logo
Publisher DaRUS
Contributor Tkachev, Gleb
Publication Year 2022
Funding Reference DFG EXC 2075 - 390740016
Rights info:eu-repo/semantics/openAccess
OpenAccess true
Contact Tkachev, Gleb (Universität Stuttgart); Tkachev, Gleb
Representation
Resource Type Dataset
Format application/octet-stream; text/x-python; text/plain; text/markdown
Size 628; 335; 78; 1066; 1461; 56632; 41286; 8811; 90; 1070; 40; 819; 6061; 1704; 296; 1809; 1662
Version 1.0
Discipline Other