UVP5 data sorted with EcoTaxa and MorphoCluster


Here, we provide plankton image data that was sorted with the web applications EcoTaxa and MorphoCluster. The data set was used for image classification tasks as described in Schröder et. al (in preparation) and does not include any geospatial or temporal meta-data. Plankton was imaged using the Underwater Vision Profiler 5 (Picheral et al. 2010) in various regions of the world's oceans between 2012-10-24 and 2017-08-08. This data publication consists of an archive containing  "training.csv" (list of 392k training images for classification, validated using EcoTaxa), "validation.csv" (list of 196k validation images for classification, validated using EcoTaxa), "unlabeld.csv" (list of 1M unlabeled images), "morphocluster.csv" (1.2M objects validated using MorphoCluster, a subset of "unlabeled.csv" and "validation.csv") and the image files themselves. The CSV files each contain the columns "object_id" (a unique ID), "image_fn" (the relative filename), and "label" (the assigned name). The training and validation sets were sorted into 65 classes using the web application EcoTaxa (http://ecotaxa.obs-vlfr.fr). This data shows a severe class imbalance; the 10% most populated classes contain more than 80% of the objects and the class sizes span four orders of magnitude. The validation set and a set of additional 1M unlabeled images were sorted during the first trial of MorphoCluster (https://github.com/morphocluster). The images in this data set were sampled during RV Meteor cruises M92, M93, M96, M97, M98, M105, M106, M107, M108, M116, M119, M121, M130, M131, M135, M136, M137 and M138, during RV Maria S Merian cruises MSM22, MSM23, MSM40 and MSM49, during the RV Polarstern cruise PS88b and during the FLUXES1 experiment with RV Sarmiento de Gamboa. The following people have contributed to the sorting of the image data on EcoTaxa: Rainer Kiko, Tristan Biard, Benjamin Blanc, Svenja Christiansen, Justine Courboules, Charlotte Eich, Jannik Faustmann, Christine Gawinski, Augustin Lafond, Aakash Panchal, Marc Picheral, Akanksha Singh and Helena Hauss In Schröder et al. (in preparation), the training set serves as a source for knowledge transfer in the training of the feature extractor. The classification using MorphoCluster was conducted by Rainer Kiko. Used labels are operational and not yet matched to respective EcoTaxa classes.

DOI https://doi.org/10.17882/73002
Related Identifier https://www.seanoe.org/data/00618/73002/illustration.gif
Metadata Access http://www.seanoe.org/oai/OAIHandler?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:seanoe.org:73002
Creator Kiko, Rainer; Schröder, Simon-martin
Publisher SEANOE
Publication Year 2020
Rights CC-BY-NC
OpenAccess true
Contact SEANOE
Resource Type dataset
Discipline Marine Science
Spatial Coverage (-180.000W, -90.000S, 180.000E, 90.000N)