Archaeological entities and timespans extracted from all archaeology documents available in DANS EASY in 2017

DOI

We trained a BERT language model for Dutch Archaeology, and fine-tuned it to perform Named Entity Recognition for 6 categories of entity: artefacts, materials, time periods, places, contexts and species. For each document, we extracted all entities, and translated time periods to year ranges. All this information is stored - together with DANS metadata such as author, title, etc - in a JSON file for each document.This is research output of the PhD research by Alex Brandsen, for the AGNES search engine project.

Identifier
DOI https://doi.org/10.17026/dans-zcs-7b72
Metadata Access https://archaeology.datastations.nl/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.17026/dans-zcs-7b72
Provenance
Creator A Brandsen ORCID logo
Publisher DANS Data Station Archaeology
Contributor Alex Brandsen
Publication Year 2021
Rights CC-BY-NC-4.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/licenses/by-nc/4.0
OpenAccess true
Contact Alex Brandsen
Representation
Resource Type Dataset
Format text/xml; application/zip
Size 3941; 4465; 822; 73081844; 2979
Version 2.0
Discipline Ancient Cultures; Archaeology; Humanities