A Human-Annotated Dataset for Language Modeling and Named Entity Recognition in Medieval Documents (2023-01-05)

PID

This is an open dataset of sentences from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains a corpus for language modeling and human annotations for named entity recognition (NER).

Identifier
PID http://hdl.handle.net/11234/1-5024
Related Identifier https://nlp.fi.muni.cz/projects/ahisto/ner-dataset
Related Identifier http://hdl.handle.net/11234/1-4936
Related Identifier https://starfos.tacr.cz/en/project/TL03000365
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/1-5024
Provenance
Creator Novotný, Vít; Luger, Kristýna; Štefánik, Michal; Vrabcová, Tereza; Horák, Aleš
Publisher Masaryk University, Brno
Publication Year 2022
Rights Public Domain Dedication (CC Zero); http://creativecommons.org/publicdomain/zero/1.0/; PUB
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Representation
Language Czech; English; German; Latin
Resource Type corpus
Format text/plain; charset=utf-8; application/zip; downloadable_files_count: 3
Discipline Linguistics