RMQS:
The French Soil Quality Monitoring Network (RMQS) is a national program for the assessment and long-term monitoring of the quality of French soils. This network is based on the monitoring of 2,240 sites representative of French soils and their land use. These sites are spread over the whole French territory (metropolitan and overseas) along a systematic square grid of 16 km x 16 km cells. The network covers a broad spectrum of climatic, soil and land-use conditions (croplands, permanent grasslands, woodlands, orchards and vineyards, natural or scarcely anthropogenic land and urban parkland). The first sampling campaign in metropolitan France took place from 2000 to 2009.
Dataset:
This dataset contains taxonomic affiliation (genus;family;order;class;phylum) for 16S rDNA (Archaea + Bacteria) dataset of 1,842 sites of the RMQS.
Soil 16S rDNA gene was sequenced using pyrosequecing (GS FLX Titanium - Roche 454) at Genosocope. Bioinformatics analysis was performed using BIOCOM-PIPE (previously named GNS-PIPE) metabarcoding pipeline. Sequences taxonomic affiliation is based on Silva r132 database (see this zenodo repository for details). Taxonomic affiliation was performed on a rarefied dataset (10,000 reads). See associated articles for details, as well as Terrat et.al. (2014). Raw sequencing data are available at EBI.
File structure:
Taxonomy was splitted across five files with one line per site and one column per taxa (rmqs1_taxonomy_<level>).
Each line sums to 10,000 (rarefaction defined threshold).
Three supplementary columns are present:
Unknown: not matching any reference.
Unclassified: missing taxa between genus and phylum.
Environmental: matched to sample from environmental study, generally with only a phylum name.
Five metadata files describe upper taxonomic level for each taxa (rmqs1_taxonomy_<level>.metadata.tsv).
Details:
Some sites sample could not be collected, they do not appear in dataset.
Some sites did not pass laboratory or bioinformatics step to attain 10k sequences before taxonomic assignation, they dot not appear in the dataset.
One can link this dataset with 10.15454/QSXKGA to get each sample physico-chemical property, landuse, coordinates, or filtering sites using its site_officiel column.
Sites with ID longer than 4 number are supplementary sites that are not in the center of the cells (e.g. 10797 and 20797 that came from cell 797).