The corpus is the collection of 165 documents on plant health to which the manual annotations of the 'Training and development dataset for information extraction in plant epidemiomonitoring' apply. The documents are public web documents about quarantine pest in Europe that have been pre-processed and translated in English.
The annotations in the Training and development dataset refer to character positions within the documents of the corpus. Both datasets are intended for the training and validation of information extraction methods.