Ultrahigh-resolution Fourier transform tandem mass spectrometry was employed to reveal novel structural detail of the natural complex mixture dissolved organic matter (DOM) that is found ubiquitously in soils and rivers. We developed and evaluated a novel approach to decipher the structural detail that is encrypted in DOM. One DOM sample from a spruce forest (Wetzstein, Germany, 50° 27' 13" N; 11° 27' 27" E; 785 meter above sea level) and Suwannee River Natural Organic Matter (SRNOM, purchased from International Humic Substances Society as isolate 2R101N; details given in Green et al. 2015, Environm Eng Sci 32, 1) were used as representative biodegraded DOM mixtures of high complexity and measured by direct-injection tandem mass spectrometry (DI-ESI-Orbitrap-MS/MS). The unknowns in DOM were then compared with indicative tandem MS features (mass differences, "dm" features, written with greek letter delta instead of d) from known standard compounds (14 phenolic standard substances measured in parallel, and 11477 library mass spectra available from the java-based software framework SIRIUS which included nearly 18000 unique molecular structures) and natural product and in-silico structure suggestions. The dataset consists of seven subsets (Data Set S1 - S7), all of which are xlsx files. "Data Set S1", contains the standard compound data and fragmentation sensitivities (14 phenolic standards) and general information on the analyzed parts of the DOM mass spectrum (molecular indices, number of precursors, number of product ions). Data Sets S2 through S5 contain the aligned DOM molecular composition data obtained at different collision energies for four mass windows ("Data Set S2", m/z 241; "Data Set S3", m/z 301; "Data Set S4", m/z 361; "Data Set S5", m/z 417) and include mass difference matching results (non-indicative dm features, standard compound (14 phenolics) dm features, and SIRIUS library spectra Δm features). "Data Set S6" contains the full dm feature lists and several data tables on individual DOM precursor properties (for example, aggregated matching results for indicative dm features (incl. N- and S-containing precursors), DOM precursor fragmentation sensitivity data, two-way clustering data of precursors and dm features, and structure suggestions classified into broader structural families ("scaffolds"). "Data Set S7" contains the results of a two-way clustering analysis using 725 SIRIUS-annotated dm features. In this dataset, the dm data is used to estimate structural compositions of individual DOM precursor ions. More details can be found in the related manuscript by the same authors.
Version 2, original version can be foud under "Original dataset"