Reproducibility in running scientific simulations on high-performance computing (HPC) environments is a persistent challenge due to variations in software and hardware stacks. Differences in software versions or hardware-specific optimizations often lead to discrepancies in simulation outputs. While Linux containers are commonly used to standardize software environments, tools like Docker lack reproducibility in image creation, requiring archiving of binary image blobs for future use. This method turns containers into black boxes, preventing verification of how the contained software was built.
In the linked paper, we demonstrate how we use GNU Guix to create our software stack bit-by-bit reproducible from a source bootstrap. Our approach incorporates a portable OpenMPI implementation, optimized software builds, and deployment via Apptainer images across three HPC environments. We show that our reproducible software stack facilitates consistent multi-physics simulations and complex workflows on diverse HPC platforms, exemplified by the OpenGeoSys software project. To ensure provenance of our findings, we utilized the AiiDA workflow manager.
This dataset includes the complete AiiDA provenance database underlying the results presented in the paper. The AiiDA workflow itself is defined in and can be reproduced with this repository: https://gitlab.opengeosys.org/bilke/hpc-container-study.