GLOBALISE Ground Truth for Handwritten Text and Layout Recognition

DOI

This dataset contains Ground Truth PageXML files that were used to finetune the GLOBALISE Handwritten Text Recognition, baseline detection and region detection models (see Related Publications).

This collection includes a datasheet with comprehensive details about the motivation for creating this dataset, the files it comprises, and their potential uses. Additionally, it contains guidelines for creating text region Ground Truth. The transcription Ground Truth files were created in accordance with the guidelines of the Dutch National Archives.

Identifier
DOI https://doi.org/10.34894/GVINNQ
Metadata Access https://dataverse.nl/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34894/GVINNQ
Provenance
Creator Pepping, Kay (ORCID: 0000-0002-3747-706X); Hids, Maartje; Tosun, Merve ORCID logo; Brink, Femke ORCID logo; Swüste, Marja; GLOBALISE project
Publisher DataverseNL
Contributor Petram, Lodewijk; IISG Data; GLOBALISE project
Publication Year 2025
Rights CC-BY-SA-4.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/licenses/by-sa/4.0
OpenAccess true
Contact Petram, Lodewijk (Huygens Institute); IISG Data (IISG)
Representation
Resource Type Dataset
Format application/pdf; application/zip
Size 175903; 917397; 133327; 450508; 523559; 821561; 511728; 421392; 98310
Version 1.0
Discipline Humanities