The Surface Ocean CO2 Atlas (SOCAT) version 2021 (v2021) dataset (Bakker et al., 2016, Bakker et al., 2021) is a quality-controlled dataset containing 30.6 million surface ocean gaseous CO2 measurements collated from thousands of individual submissions. These gaseous CO2 measurements are typically collected at many different depths (of the order of several metres below the surface) using many different systems, and the sampling depth varies dependent upon the sampling platform and/or setup. Different platforms (e.g. ships of opportunity, research vessels) and systems will collect water samples at different depths, and the sampling depth can even vary dependent upon sea state. Therefore, the collated SOCAT dataset contains high quality data, but these data are all valid for different and inconsistent depths. This means that the SOCAT provided individual gaseous CO2 measurements and gridded data are sub-optimal for calculating global or regional atmosphere-ocean gas exchange (and the resultant net CO2 sinks) and sub-optimal for verifying gas fluxes from (or assimilation into) numerical models. Accurate calculations of CO2 flux between the atmosphere and oceans require CO2 concentrations at the top and bottom of the mass boundary layer, the ~100 μm deep layer that forms the interface between the ocean and the atmosphere (Woolf et al., 2016). Ignoring vertical temperature gradients across this very small layer can result in significant biases in the concentration differences and the resulting gas fluxes (e.g. ~5 to 29% underestimate in global net CO2 sink values, Woolf et al., 2016). It is currently impossible to measure the CO2 concentrations either side of this very thin layer, but it is possible to calculate the concentrations either side of this layer using the SOCAT data, satellite observations and knowledge of the carbonate system. Therefore to enable the SOCAT data to be optimal for an accurate atmosphere-ocean gas flux calculation, a reanalysis methodology was developed to enable the calculation of the fugacity of CO2 (fCO2) for the bottom of the mass boundary layer (termed sub-skin value). The theoretical basis and justification for this is described in detail within Woolf et al., (2016) and the re-analysis methodology is described in detail in (Goddijn-Murphy et al., 2015). The re-analysis calculation exploits paired in situ temperature and fCO2 measurements in the SOCAT dataset, and uses an Earth observation dataset to provide a depth-consistent (sub-skin) temperature field to which all fugacity data are reanalysed. The outputs provide paired fCO2 (and partial pressure of CO2) and temperature data that correspond to a consistent sub-skin layer temperature. These can then be used to accurately calculate concentration differences and atmosphere-ocean CO2 gas fluxes. This data submission contains a reanalysis of the fugacity of CO2 (fCO2) from the SOCAT version 2021 dataset to a consistent sub-skin temperature field. The reanalysis was performed using a tool that is distributed within the FluxEngine V4.0.1 open source software toolkit (https://github.com/oceanflux-ghg/FluxEngine) (Shutler et al., 2016; Holding et al., 2019). All data processing and driver scripts are available from the FluxEngine Ancillary Tools (FEAT) repository https://github.com/oceanflux-ghg/FluxEngineAncillaryTools. The National Oceanic and Atmospheric Administration (NOAA) Optimum Interpolation Sea Surface Temperature (OISST) dataset (Reynolds et al., 2007) were used to provide the climate quality and depth consistent temperature data. The original ¼ degree OISST weekly data (v2.1) were first resampled to provide monthly mean values on a 1º by 1º degree grid (using the Python tools provided in the FEAT repository). These monthly 1º by 1º data were then used as the temperature input for the reanalysis. The resulting reanalysed data are provided as a tab-separated value file (individual data points) and as netCDF-5 file (gridded monthly means). These are the same file formats as provided by SOCAT and analogous to the SOCAT single data point and gridded data. Each row in the tab-separated value file corresponds to a row in the original SOCAT version 2021 dataset. The original SOCAT version 2021 data are included in full, with four additional columns containing the reanalysed data: * T_reynolds - The temperature (in degrees C) taken from the consistent OISST temperature field for the corresponding time and location. * fCO2_reanalysed - The fugacity of CO2 (in μatm) reanalysed to the consistent surface temperature indicated by T_reynolds. * pCO2_SST - The partial pressure of CO2 (in μatm) corresponding to the in situ (measured) temperature. * pCO2_reanalysed - The partial pressure of CO2 (in μatm) reanalysed to the consistent surface temperature indicated by T_reynolds. The netCDF gridded version of the reanalysed dataset contains monthly mean data, binned into a 1º by 1º grid and uses the same units, missing value indicators and time and space resolution as the original SOCAT gridded product to maximise compatibility. The gridding is performed using the SOCAT gridding methodology (Sabine et al. 2013). The implementation of the gridding has been verified by performing the gridding on the original (non-reanalysed) SOCAT data and all results were identical to 8 decimal places. The result of gridding the original SOCAT data are included within these netCDF data, along with additional variables containing the equivalent results for the reanalysed SOCAT data. Statistical sample mean, minimum, maximum, standard deviation and count data for each grid cell are included, with unweighted and cruise-weighted versions (following the convention used by SOCAT). Full meta data are included within the file.
- Due to the temporal range of the OISST dataset the reanalysed values are only available from 1981 onwards. Pre-1981 rows contain NaN (not-a-number) in the reanalysis columns. 2. The download for this submission is provided as a single .zip file (1.6 GB, uncompressed: 13.1 GB) containing two files: SOCATv2021_reanalysed_using_subskin_with_header-v1r2.tsv (containing every data point, ungridded) and SOCATv2021_reanalysed_using_subskin-v1r2.nc (the gridded monthly mean data). 3. The dataset can be referred to the re-analysis of SOCATv2021, re-analysis version 1 revision 2 (v1r2). Please cite this PANGAEA submission, the theory (Woolf et al., 2016), the reanalysis methodology (Goddijn-Murphy et al., 2015), the FluxEngine toolbox was used to perform the reanalysis (Shutler et al., 2016, Holding et al. 2019) and the original SOCAT dataset (Bakker et al., 2016, 2021) and/or gridded equivalent (Sabine et al., 2013, Bakker 2021).