GLOBALISE - VOC Document Segmentation Dataset

DOI

This dataset contains detailed annotations of Dutch East India Company (VOC) archival documents based on the TANAP (Towards a New Age of Partnership) project. The dataset provides precise boundaries and classifications for documents within digitized archival volumes, serving as training data for machine learning approaches to historical document segmentation and classification. This work supports the broader goal of making VOC archives more accessible beyond traditional finding aids that often reflect colonial perspectives.

This dataset was migrated from the IISH Data Collection to DataverseNL in January 2026. It was previously published as: Smit, Renate, 2026, "GLOBALISE - VOC Document Segmentation Dataset", https://hdl.handle.net/10622/XMCZLZ, IISH Data Collection, V1 under the persistent identifier hdl:10622/XMCZLZ.

Identifier
DOI https://doi.org/10.34894/XAJ12C
Metadata Access https://dataverse.nl/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34894/XAJ12C
Provenance
Creator Smit, Renate (ORCID: 0009-0005-1070-636X); Pepping, Kay (ORCID: 0000-0002-3747-706X)
Publisher DataverseNL
Contributor IISH Data-department
Publication Year 2026
Rights CC-BY-4.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/licenses/by/4.0
OpenAccess true
Contact IISH Data-department (International Institute for Social History)
Representation
Resource Type Dataset
Format text/csv; application/pdf
Size 35875; 52653; 68669; 43310; 43492; 55376; 26564; 66223; 98079; 97981; 43579; 27085; 29862; 33184; 6814; 5918; 28227; 8205; 6987; 8778; 9515; 25516; 195041
Version 3.0
Discipline Humanities