Corpus for the epidemiomonitoring of plant

DOI

The EPOP corpus is the collection of 247 documents on plant health. The documents are public web documents about quarantine pest in Europe that have been pre-processed and translated to English. The documents are split into a training (110), a development (55) and a test (82) sets. The gold-standard annotation for the training and development sets are available on "Training and development dataset for information extraction in plant epidemiomonitoring" dataset. Both datasets are intended for the training and evaluation of information extraction methods. The EPOP dataset is the basis for the PestCLEF task of the LifeCLEF 2026 challenge.

Identifier
DOI https://doi.org/10.57745/YKSEPY
Metadata Access https://entrepot.recherche.data.gouv.fr/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.57745/YKSEPY
Provenance
Creator MaIAGE; Plateforme ESV
Publisher Recherche Data Gouv
Contributor Claire Nédellec; Marie Grosdidier; Robert Bossy; Sandy Duperier; Isabelle Pieretti; MaIAGE; Plateforme ESV; Louise Deléger; Entrepôt-Catalogue Recherche Data Gouv
Publication Year 2025
Funding Reference Agence nationale de la recherche ANR-20-PCPA-0002 ; INRAE ; PIA DATAIA
Rights info:eu-repo/semantics/openAccess
OpenAccess true
Contact Claire Nédellec (INRAE); Marie Grosdidier (INRAE)
Representation
Resource Type Dataset
Format application/zip
Size 381789
Version 2.0
Discipline Agriculture, Forestry, Horticulture; Computer Science; Life Sciences; Agricultural Sciences; Agriculture, Forestry, Horticulture, Aquaculture; Agriculture, Forestry, Horticulture, Aquaculture and Veterinary Medicine; Medicine