A Human-Annotated Dataset for Language Modeling and Named Entity Recognition in Medieval Documents


This is an open dataset of sentences from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains a corpus for language modeling and human annotations for named entity recognition (NER).

PID http://hdl.handle.net/11234/1-4936
Related Identifier https://nlp.fi.muni.cz/projects/ahisto/ner-dataset
Related Identifier http://hdl.handle.net/11234/1-5024
Related Identifier https://starfos.tacr.cz/en/project/TL03000365
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/1-4936
Creator Novotný, Vít; Luger, Kristýna; Štefánik, Michal; Vrabcová, Tereza; Horák, Aleš
Publisher Masaryk University, Brno
Publication Year 2022
Rights Public Domain Dedication (CC Zero); http://creativecommons.org/publicdomain/zero/1.0/; PUB
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Language Czech; English; German; Latin
Resource Type corpus
Format text/plain; charset=utf-8; application/zip; downloadable_files_count: 2
Discipline Linguistics