GLOBALISE - VOC Document Segmentation Dataset

DOI

This dataset contains detailed annotations of Dutch East India Company (VOC) archival documents based on the TANAP (Towards a New Age of Partnership) project. The dataset provides precise boundaries and classifications for documents within digitized archival volumes, serving as training data for machine learning approaches to historical document segmentation and classification. This work supports the broader goal of making VOC archives more accessible beyond traditional finding aids that often reflect colonial perspectives.

Identifier
DOI https://doi.org/10.34894/QBVEMX
Metadata Access https://dataverse.nl/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34894/QBVEMX
Provenance
Creator Smit, Renate (ORCID: 0009-0005-1070-636X)
Publisher DataverseNL
Contributor Pepping, Kay
Publication Year 2025
Rights CC-BY-4.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/licenses/by/4.0
OpenAccess true
Contact Pepping, Kay (Huygens Instituut)
Representation
Resource Type Dataset
Format text/tsv; application/pdf
Size 36846; 54078; 70503; 56850; 27253; 68000; 100734; 100399; 44683; 27791; 30615; 34039; 6993; 6073; 28961; 8415; 7168; 9005; 9750; 26174; 185805
Version 1.0
Discipline Humanities