This repository contains the dataset and code used to generate synthetic dataset as explained in the paper "Usefulness of synthetic datasets for diatom automatic detection using a deep-learning approach".
Dataset :
The dataset consists of two components: individual diatom images extracted from publicly available diatom atlases [1,2,3] and individual debris images.
- Individual diatom images : currently, the repository consists of 166 diatom species, totalling 9230 images. These images were automatically extracted from atlases using PDF scraping, cleaned and verified by diatom taxonomists. The subfolders within each diatom specie indicates the origin of the images: RA[1], IDF[2], BRG[3].
Additional diatom species and images will be regularly updated in the repository.
- Individual debris images : the debris images were extracted from real microscopy images. The repository contains 600 debris objects.
Code :
Contains the code used to generate synthetic microscopy images. For details on how to use the code, kindly refer to the README file available in synthetic_data_generator/
.