FMPD - Freshwater Microscopy Phytoplankton Dataset

  1. Rivas-Villar, David 1
  2. Figueroa, Jorge 1
  3. Carballeira, Rafael
  4. Rouco, José 1
  5. Novo, Jorge 1
  1. 1 Universidade da Coruña
    info

    Universidade da Coruña

    La Coruña, España

    ROR https://ror.org/01qckj285

Editor: Zenodo

Año de publicación: 2024

Tipo: Dataset

CC BY-NC-ND 4.0

Resumen

This dataset, FMPD (Freshwater Microscopy Phytoplankton Dataset), is released for non-comercial academic or research purposes only, subject to attribution through citation of the following papers - Figueroa, J. Rouco, J. Novo, "Phytoplankton detection and recognition in freshwater digital microscopy images using deep learning object detectors", Heliyon, 2023.          - D. Rivas-Villar, J. Rouco, M. G. Penedo, R. Carballeira, J. Novo, "Fully automatic detection and classification of phytoplankton specimens in digital microscopy images", Computer Methods and Programs in Biomedicine, 200, 105923, 2021 Please also consider the citation of any of the other related papers from the dataset authors.    Data: The FMPD dataset is a set of multi-specimen microscopy images of freshwater phytoplankton. These images have been captured with fixed settings, equal for each image, including illumination, focal point and magnification. The dataset contains 293 images from water sampled at lake of Doniños (Ferrol, Galicia, Spain) (UTM 555593 X, 4815672 Y; Datum ETRS89) on multiple visits throughout the year. This ensures seasonal representability. The phytoplankton sample was concentrated by filtering volume of 0.5 L through GF/F glass fiber filters and was then resuspended in 50 mL. Phytoplankton samples were preserved using 5% (v/v) glutaraldehyde, because it is efficient at preserving both cellular structures and pigment. The fixed sample was stored in the dark at constant temperature (10 oC) until analysis. The phytoplankton sample was homogenised for 2 min prior to microscopic examination. In addition, the sample was subjected to vacuum for one minute to break the vacuoles of some cyanobacterial taxa and prevent them from floating. Aliquots of the phytoplankton sample with a total volume of 1 mL were examined under light microscopy using a Nikon Eclipse E600 equipped with an E-Plan 10× objective (N.A. 0.25). Light microscopy images were taken with an AxioCam ICc5 Zeiss digital camera, maintaining the same illumination and focus throughout the image acquisition process and following regular transects until the entire surface of the sample was covered. The dataset contains 293 multi-specimen phytoplankton images. As mentioned, these images have fixed magnification, illumination and focal point. The produced images are saved in .tif format with a size of 2080x1540 pixels and are located in the dataset folder. The ground truth consists of bounding boxes that enclose the phytoplankton specimens, with an associated label identifying the species. Currently, this dataset has tags for: - Non-phytoplankton: particles, debris, zooplankton or any other object that could be mistaken as phytoplankton- Woronichinia naegeliana: Toxin-producing cyanobacteria- Anabaena Spiroides: Toxin-producing cyanobacteria- Dinobryon Sp.: Harmless but challenging as it can both appear solitary or in colonies- Other-phytoplankton: Other phytoplankton species. Annotations are provided in a .json file in the format typically used by the coco dataset, in the annotations.json file.  Holdout train-test splits, as well as k-fold cross-validation splits, are provided in the splits folder, available in .json format. These splits correspond to those used in the previously mentioned papers to be cited, facilitating straightforward comparisons. Additionally, the annotations for each subset are included in separate files within the same folder for ease of use. It should be noted that the annotations.json contains all of these subsets of annotations.