We aimed to explore the community compostion of benthic heterotrophic protists in three different regions of the southern Baltic Sea - Fehmarnbelt, Oderbank and Roennebank. Sediment samples where collected with a multicorer system and sliced into layered depth profile during three cruises in 2020, 2021 and 2022 with research vessels Elisabeth Mann Borgese (2020, EMB238 and 2021, EMB267) and Alkor (2022, AL570). We performed a paired-end NovaSeq sequencing (2 × 150 bp) run of the amplified V9 region of the 18S rDNA. For subsequent quality measures during data analysis, we created an in vitro community, called a "mock community", comprising DNA of nine different protist cultures, adding this mixture to each individual sequencing run. After sequencing, the raw reads were demultiplexed and the barcode and primer sequences were clipped using cutadapt (Martin 2011). The data was further processed using the the dada2 package in R (Callahan et al. 2016). For taxonomic assignment we used the PR2 database (Protist Ribosomal Reference database, Guillou et al. 2012, https://pr2-database.org/ ) updated with 150 sequences obtained from our own collection using usearch_global (v2.18.0, Rognes et al. 2016). We discarded all Metazoa, fungi, autotrophic protists (determined on the basis of taxonomic assignment) and retained only heterotrophic protists' amplicon sequence variants (ASVs) with a pairwise identity of >80% to a reference sequence. For the main dataset of samples, we then chose individual minimum thresholds per sample according to the accompanying mock community on the respective sequencing lane. For calculation of these thresholds, we used the proportion of the lowest read number of an ASV in the mock community data set that could be assigned to the cultured species.
Explanation of data header: ASV_ID = amplicon sequence variant ID Number_of_reads = total number of reads Percentage_Identity = percentage of identity to a sequence from the reference database GenBank_Closer_Match = accession number of the closest match in the database Taxonomic ranks from Kingdom up to species level Sequences = sequence for each ASV Seq_length = sequence length for each ASV Reads for each ASV in each individual sample (columns X144 to X781C15 which refer to sample IDs)The environmental data for each sample was archived separately in PANGAEA (Sachs & Dünn, 2023, https://doi.org/10.1594/PANGAEA.961784)