Amplicon sequence variant table of benthic heterotrophic protists of the southern Baltic Sea sampled in 2020-2022

DOI

We aimed to explore the community compostion of benthic heterotrophic protists in three different regions of the southern Baltic Sea - Fehmarnbelt, Oderbank and Roennebank. Sediment samples where collected with a multicorer system and sliced into layered depth profile during three cruises in 2020, 2021 and 2022 with research vessels Elisabeth Mann Borgese (2020, EMB238 and 2021, EMB267) and Alkor (2022, AL570). We performed a paired-end NovaSeq sequencing (2 × 150 bp) run of the amplified V9 region of the 18S rDNA. For subsequent quality measures during data analysis, we created an in vitro community, called a "mock community", comprising DNA of nine different protist cultures, adding this mixture to each individual sequencing run. After sequencing, the raw reads were demultiplexed and the barcode and primer sequences were clipped using cutadapt (Martin 2011). The data was further processed using the the dada2 package in R (Callahan et al. 2016). For taxonomic assignment we used the PR2 database (Protist Ribosomal Reference database, Guillou et al. 2012, https://pr2-database.org/ ) updated with 150 sequences obtained from our own collection using usearch_global (v2.18.0, Rognes et al. 2016). We discarded all Metazoa, fungi, autotrophic protists (determined on the basis of taxonomic assignment) and retained only heterotrophic protists' amplicon sequence variants (ASVs) with a pairwise identity of >80% to a reference sequence. For the main dataset of samples, we then chose individual minimum thresholds per sample according to the accompanying mock community on the respective sequencing lane. For calculation of these thresholds, we used the proportion of the lowest read number of an ASV in the mock community data set that could be assigned to the cultured species.

Explanation of data header: ASV_ID = amplicon sequence variant ID Number_of_reads = total number of reads Percentage_Identity = percentage of identity to a sequence from the reference database GenBank_Closer_Match = accession number of the closest match in the database Taxonomic ranks from Kingdom up to species level Sequences = sequence for each ASV Seq_length = sequence length for each ASV Reads for each ASV in each individual sample (columns X144 to X781C15 which refer to sample IDs)The environmental data for each sample was archived separately in PANGAEA (Sachs & Dünn, 2023, https://doi.org/10.1594/PANGAEA.961784)

Identifier
DOI https://doi.org/10.1594/PANGAEA.961796
Related Identifier IsSupplementTo https://doi.org/10.3390/biology12071010
Related Identifier References https://doi.org/10.1594/PANGAEA.961784
Related Identifier References https://doi.org/10.1038/nmeth.3869
Related Identifier References https://doi.org/10.1093/nar/gks1160
Related Identifier References https://doi.org/10.14806/ej.17.1.200
Related Identifier References https://doi.org/10.7717/peerj.2584
Metadata Access https://ws.pangaea.de/oai/provider?verb=GetRecord&metadataPrefix=datacite4&identifier=oai:pangaea.de:doi:10.1594/PANGAEA.961796
Provenance
Creator Sachs, Maria ORCID logo; Dünn, Manon ORCID logo
Publisher PANGAEA
Publication Year 2023
Funding Reference Federal Ministry of Education and Research https://doi.org/10.13039/501100002347 Crossref Funder ID 03F0848D https://foerderportal.bund.de/foekat/jsp/SucheAction.do?actionMode=view&fkz=03F0848D DAM sustainMare - MGF Baltic Sea, University of Cologne
Rights Creative Commons Attribution 4.0 International; https://creativecommons.org/licenses/by/4.0/
OpenAccess true
Representation
Resource Type Dataset
Format application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Size 678.9 kBytes
Discipline Earth System Research
Spatial Coverage (10.686W, 54.249S, 14.333E, 54.773N); Baltic Sea
Temporal Coverage Begin 2020-05-27T12:11:00Z
Temporal Coverage End 2022-04-03T14:10:24Z