Corpus for the epidemiomonitoring of plant

Dataset

DOI

The EPOP corpus is the collection of 247 documents on plant health. The documents are public web documents about quarantine pest in Europe that have been pre-processed and translated to English. The documents are split into a training (110), a development (55) and a test (82) sets. The gold-standard annotation for the training and development sets are available on "Training and development dataset for information extraction in plant epidemiomonitoring" dataset. Both datasets are intended for the training and evaluation of information extraction methods. The EPOP dataset is the basis for the PestCLEF task of the LifeCLEF 2026 challenge.

Identifier
DOI	https://doi.org/10.57745/YKSEPY
Metadata Access	https://entrepot.recherche.data.gouv.fr/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.57745/YKSEPY

Provenance
Creator	MaIAGE; Plateforme ESV
Publisher	Recherche Data Gouv
Contributor	Claire Nédellec; Marie Grosdidier; Robert Bossy; Sandy Duperier; Isabelle Pieretti; MaIAGE; Plateforme ESV; Louise Deléger; Entrepôt-Catalogue Recherche Data Gouv
Publication Year	2025
Funding Reference	Agence nationale de la recherche ANR-20-PCPA-0002 ; INRAE ; PIA DATAIA
Rights	info:eu-repo/semantics/openAccess
OpenAccess	true
Contact	Claire Nédellec (INRAE); Marie Grosdidier (INRAE)

Representation
Resource Type	Dataset
Format	application/zip
Size	381789
Version	2.0
Discipline	Agriculture, Forestry, Horticulture; Computer Science; Life Sciences; Agricultural Sciences; Agriculture, Forestry, Horticulture, Aquaculture; Agriculture, Forestry, Horticulture, Aquaculture and Veterinary Medicine; Medicine