PyPlant: A Python Framework for Cached Function Pipelines

Dataset

DOI

PyPlant is a simple coroutine-based framework for writing data processing pipelines. PyPlant's goal is to simplify caching of intermediate results in the pipeline and avoid re-running expensive early stages of the pipeline, when only the later stages have changed.

PyPlant is a simple coroutine-based framework for writing data processing pipelines. Given a set of Python functions that consume and produce data, it automatically runs them in a correct order and caches intermediate results. When the pipeline is executed again, only the necessary parts are re-run.

Importantly, PyPlant was designed with the following design consideration in mind:

Simple: Quick to learn, no custom language and workflow design programs. Start prototyping right away. DRY: Function code is metadata. No need to write execution graphs or external metadata. It just works (tm). Automatic: No need to manually re-run outdated parts. Large data: Handle data that doesn't fit into memory. Persist between runs.

PyPlant can be installed from PyPI: pip install pyplant For documentation, see README.md.

Identifier
DOI	https://doi.org/10.18419/darus-2249
Metadata Access	https://darus.uni-stuttgart.de/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.18419/darus-2249

Provenance
Creator	Tkachev, Gleb
Publisher	DaRUS
Contributor	Tkachev, Gleb
Publication Year	2022
Funding Reference	DFG EXC 2075 - 390740016
Rights	info:eu-repo/semantics/openAccess
OpenAccess	true
Contact	Tkachev, Gleb (Universität Stuttgart); Tkachev, Gleb

Representation
Resource Type	Dataset
Format	application/octet-stream; text/x-python; text/plain; text/markdown
Size	628; 335; 78; 1066; 1461; 56632; 41286; 8811; 90; 1070; 40; 819; 6061; 1704; 296; 1809; 1662
Version	1.0
Discipline	Other