Benchmark dataset for 24-hour stratospheric ozone tendencies (SWIFT-AI-DS)

DOI

SWIFT-AI-DS is a benchmark dataset that consists of samples that have been derived from two simulation runs (each 2.5 years long) of the chemistry and transport model ATLAS (Wohltmann and Rex, 2009; Wohltmann et al., 2010). This data set of nearly 200 million samples meets the requirements of a labelled data set and is ideally suited for training and testing of a machine learning based surrogate model.Two time periods were considered in the simulation runs: first from November 1998 to March 2001 and the second from November 2004 to March 2007.The dataset covers the entire Earth geographically, but is vertically restricted to the altitudes of the lower to middle stratosphere, for which the SWIFT (Rex et al., 2014; Kreyling et. al, 2017; Wohltmann et al., 2017) approach of 24-hour ozone tendencies can be applied. Applicability was determined in terms of the chemical lifetime of stratospheric ozone, which is a function of solar irradiance and altitude. It can be described by a dynamic upper bound [Kreyling et. Al, 2017]. Within the range where the chemical lifetime is longer than 14 days, ozone is not in quasi-chemical equilibrium. Moreover, this data set focuses on the region of the lower to middle stratosphere because it is the region with the largest contribution to the total ozone column.State-of-the-art physical process models for stratospheric chemistry require enormous computational time. Our research is focused on developing much faster, yet accurate, surrogate models for computing the 24-hour tendencies of stratospheric ozone. Much faster models of stratospheric ozone provide a new application area such as for climate models. These surrogate models benefit greatly from the methodological and hardware improvements of the last decade.Each simulation run uses the full stratospheric chemistry model to solve a system of differential equations involving 47 chemical species and 171 chemical reactions at a very high (<< seconds) and variable temporal resolution. The ATLAS model is driven by ECMWF reanalysis data (either ERA-I or ERA5). The air parcel state has been sampled at a 24-hour time step (00:00 UTC model time). During postprocessing some variables are stored as 24-hour averages, as 24-hour tendencies or as the state at the beginning of the 24-hour time step. The dataset is stored in 12 monthly netCDF-files.

The benchmark-dataset consists of training- and test-data.Variables are being described in the document Description_variables.pdf.Training-Data:The training data consists of files that include ca. 100 million data samples. Each data sample consists of the input and output features that can be used to train a data-driven model on a regression taskInput X: choice of variables (see document Description_variables.pdf)Output y: 24-hour tendency of stratospheric ozoneTest-Data:Similar to the training data, but this test-data includes ca. 100 million data samples that have not been used for training. It can be used to assess model performance.

Identifier
DOI https://doi.org/10.1594/PANGAEA.939121
Related Identifier https://doi.org/10.5194/gmd-11-753-2018
Related Identifier https://doi.org/10.1023/B:JOCH.0000012284.28801.b1
Related Identifier https://doi.org/10.5194/acp-14-6545-2014
Related Identifier https://doi.org/10.5194/gmd-3-585-2010
Related Identifier https://doi.org/10.5194/gmd-10-2671-2017
Related Identifier https://doi.org/10.5194/gmd-2-153-2009
Metadata Access https://ws.pangaea.de/oai/provider?verb=GetRecord&metadataPrefix=datacite4&identifier=oai:pangaea.de:doi:10.1594/PANGAEA.939121
Provenance
Creator Mohn, Helge; Kreyling, Daniel ORCID logo; Wohltmann, Ingo ORCID logo; Lehmann, Ralph; Rex, Markus ORCID logo
Publisher PANGAEA
Publication Year 2021
Funding Reference Helmholtz Association of German Research Centres https://doi.org/10.13039/501100001656 Crossref Funder ID HIDSS-0005 https://www.mardata.de/ Helmholtz School for Marine Data Science (MarData)
Rights Creative Commons Attribution 4.0 International; https://creativecommons.org/licenses/by/4.0/
OpenAccess true
Representation
Resource Type Dataset
Format text/tab-separated-values
Size 24 data points
Discipline Atmospheric Sciences; Atmospheric chemistry; Atmospheric physics; Chemistry; Geosciences; Natural Sciences; Physics