GLOBALISE - VOC Document Segmentation Dataset - Dataset

Dataset

GLOBALISE - VOC Document Segmentation Dataset

DOI

This dataset contains detailed annotations of Dutch East India Company (VOC) archival documents based on the TANAP (Towards a New Age of Partnership) project. The dataset provides precise boundaries and classifications for documents within digitized archival volumes, serving as training data for machine learning approaches to historical document segmentation and classification. This work supports the broader goal of making VOC archives more accessible beyond traditional finding aids that often reflect colonial perspectives.

Identifier
DOI	https://doi.org/10.34894/QBVEMX
Metadata Access	https://dataverse.nl/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34894/QBVEMX

Provenance
Creator	Smit, Renate (ORCID: 0009-0005-1070-636X)
Publisher	DataverseNL
Contributor	Pepping, Kay
Publication Year	2025
Rights	CC-BY-4.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/licenses/by/4.0
OpenAccess	true
Contact	Pepping, Kay (Huygens Instituut)

Representation
Resource Type	Dataset
Format	text/tsv; application/pdf
Size	36846; 54078; 70503; 56850; 27253; 68000; 100734; 100399; 44683; 27791; 30615; 34039; 6993; 6073; 28961; 8415; 7168; 9005; 9750; 26174; 185805
Version	1.0
Discipline	Humanities