HeiCuBeDa Hilprecht - Heidelberg Cuneiform Benchmark Dataset for the Hilprecht Collection

DOI

The number of known cuneiform tablets is assumed to be in the hundreds of thousands. A fraction has been published by printing photographs and manual tracings in books, which is collected by the online Cuneiform Digital Library Initiative (CDLI) catalog including some of these images and providing metadata for more than 100.000 tablets. While 3D-acquisition of tablets is the most modern way for their documentation, the number of 3D-datasets is relatively small and often not openly accessible. However, the Hilprecht Archive Online (HAO) provides 1977 high-resolution 3D scans of tablets under an Open Access license. While both the HAO and the CDLI are accessible publicly, large-scale machine learning and pattern recognition on cuneiform tablets remains elusive, because the data is only accessible by navigating web pages, the tablet identifiers between collections are inconsistent, and the 3D data is unprepared and challenging for automated processing. We enable large-scale analysis of cuneiform tablets by this HeiCuBeda for Hilprecht assembly, which is a cross-referenced benchmark dataset of processed cuneiform tablets: (i) frontally aligned 3D tablets with pre-computed high-dimensional surface features, (ii) six-views raster images for off-the-shelf image processing, and (iii) metadata, transcriptions, and transliterations, for a subset of 707 tablets, for learning alignment between 3D data, image and linguistic expression. This is the first dataset of its kind, and of its size, in cuneiform research. This benchmark dataset is prepared for ease-of-use and immediate availability for computational researches, lowering the barrier to experiment and apply standard methods of analysis. A script in Python is provided to retrieve and compute an updated JSON database of the CDLI’s metadata and raster images. Up-to-date code and meta-data are also available at https://gitlab.com/fcgl/releases/-/tree/master/mara_icdar_2019.

GigaMesh Software Framework, 181100 to 190300

Hilprecht Sammlung, Jena, Germany, https://hilprecht.mpiwg-berlin.mpg.de/

Cuneiform Digital Library Initiative (CDLI) https://cdli.ucla.edu/

Further Identifiers of the persons involved:

Hubert Mara: ORCID: https://orcid.org/0000-0002-2004-4153, Wikidata: https://www.wikidata.org/wiki/Q97924674 Bartosz Bogacz: ORCID: https://orcid.org/0000-0002-8323-5694, Wikidata: https://www.wikidata.org/wiki/Q102869220 Paul Victor Bayer: ORCID: https://orcid.org/0000-0003-1528-5531

Identifier
DOI https://doi.org/10.11588/DATA/IE8CCN
Related Identifier IsCitedBy https://doi.org/10.1109/ICDAR.2019.00032
Related Identifier IsCitedBy https://doi.org/10.1109/ICFHR2020.2020.00053
Related Identifier IsCitedBy https://doi.org/10.2312/VAST/VAST10/131-138
Related Identifier IsCitedBy https://doi.org/10.11588/heidok.00013890
Metadata Access https://heidata.uni-heidelberg.de/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.11588/DATA/IE8CCN
Provenance
Creator Mara, Hubert ORCID logo
Publisher heiDATA
Contributor Mara, Hubert; Hubert Mara; Bartosz Bogacz; Bayer, Paul Victor
Publication Year 2019
Rights info:eu-repo/semantics/openAccess
OpenAccess true
Contact Mara, Hubert (IWR, Heidelberg University)
Representation
Resource Type Cuneiform tablets; Dataset
Format application/pdf; application/zip; application/json; text/x-python; application/x-tar
Size 29071352; 9773; 9253724556; 10640638461; 11215516; 35525; 20612858612; 5602761013; 7815347423; 9375286640; 11596564683; 12539141376; 23310997074; 26764166031; 34657998299; 27073603522; 21546176394; 17854994089; 8915439378; 19354870934; 11898317258; 9925492409; 13105697463; 17244915805; 14865223908; 9030131234; 15436124160; 4197980160; 5837168640; 6961643520; 8665210880; 9382451200; 16872478720; 20110469120; 25722337280; 20155678720; 16076257280; 13342832640; 6617569280; 14500362240; 8874915840; 7410718720; 9892382720; 12968468480; 11076096000; 6729134080
Version 2.0
Discipline Humanities
Spatial Coverage Heidelberg, Germany