-
Corpus of Slovenian periodicals (1771-1914) sPeriodika 1.0
The corpus of Slovenian periodicals sPeriodika contains linguistically annotated periodicals published during the 18th, 19th, and beginning of 20th century (1771-1914). The... -
Uniform Meaning Representation 2.1 (Czech and Latin)
Czech and Latin UMR data, both manually annotated and programmatically converted from manually annotated tectogrammatical data. -
Multilingual comparable corpora of parliamentary debates ParlaMint 5.0
ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and... -
Desam v2.0
DESAM is a czech morphologically annotated corpus which has been manually disambiguated. Each token annotated for lemma, part-of-speech and all grammatical categories using the... -
Carniolan Provincial Assembly corpus Kranjska 1.0
The corpus contains meeting proceedings of the Carniolan Provincial Assembly from 1861 to 1913 (Obravnave deželnega zbora kranjskega / Bericht über die Verhandlungen des... -
Linguistically annotated multilingual comparable corpora of parliamentary deb...
ParlaMint-en.ana 5.0 is the English machine translation of the ParlaMint.ana 5.0 (http://hdl.handle.net/11356/2005) set of corpora of parliamentary debates across Europe. The... -
Linguistically annotated multilingual comparable corpora of parliamentary deb...
ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and... -
Multilingual comparable corpora of parliamentary debates ParlaMint 4.1
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and... -
Linguistically annotated multilingual comparable corpora of parliamentary deb...
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and... -
Linguistically annotated multilingual comparable corpora of parliamentary deb...
ParlaMint-en.ana 4.1 is the English machine translation of the ParlaMint.ana 4.1 (http://hdl.handle.net/11356/1911) set of corpora of parliamentary debates across Europe. The... -
ParCzech 4.0
The ParCzech 4.0 corpus consists of stenographic protocols that record the Chamber of Deputies' meetings in the 7th term (2013-2017), the 8th term (2017-2021) and the current... -
Swedish-speaking population of Finland - statistics
data about the Swedish speaking minority in Finland from Finstat used for a report in PowerBI -
Possessive Pronoun Preference
The contribution includes the data frames and the R script (Markdown file) belonging to the paper "Morphological and Pragmatic Conditioning of Reflexivity in Possessive... -
LegISTyr test set
LegISTyr is a machine translation test set for evaluating the quality of legal terminology translation from Italian to South Tyrolean German, a minor standard variety of German.... -
LITUND corpus v1
LITUND contains two comparable corpora: 1. Unreliable news texts. 147 full-text articles (100,678 words) identified as misleading by professional fact-checkers. The corpus... -
MultiCo
The MultiCo multimodal corpus is one of the outcomes of the project "Digital Research Infrastructure for the Humanities and Arts Studies DARIAH-PL." This project was funded by... -
Monitor corpus of Slovene Trendi 2025-05
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 57 publishers. Trendi 2025-05 covers the period from January... -
MorfoCzech
A dictionary of morphologically segmented word forms in Czech. Rules of manual segmentation are described in Pelegrinová, K., Mačutek, J., Čech, R. (2021). The Menzerath-Altmann... -
Deutsches Wörterbuch (1DWB, by Jacob and Wilhelm Grimm)
retro-digitized version of the first edition of the Deutsches Wörterbuch by Jacob and Wilhelm Grimm, originally published from 1854 to 1960 -
TITUS Middle Welsh
ca. 20.000 tokens; linked with relational database; XML-encoding in progress