-
Replication data for: The beginning of a beautiful friendship: rule-based and...
We describe and compare two tools for processing Middle Russian texts. Both tools provide lemmatization, part-of-speech and morphological annotation. One (“RNC”) was developed... -
Replication data for: The ongoing eclipse of possessive suffixes in North Saa...
North Saami is replacing the use of possessive suffixes on nouns with a morphologically simpler analytic construction. Our data (>2K examples culled from >.5M words) track... -
Diakorp v6: diachronic corpus of Czech
Diachronic corpus of Czech sized 3.45 million words (i.e. 4.1 million tokens). It contains 116 texts from the 14th-20th century period. The texts are transcribed, not... -
B4 Heliand
Heliand 1, 4 and 5: complete text, status: final, digitalization, translation to Modern German, manually annotated with parts of speech, syntactic categories, grammatical... -
Frequency list of textbook vocabulary by level of education in elementary and...
The dataset contains a list of 11906 words (lemmas with part of speech information) and their frequency of occurrence in a corpus of Slovenian textobooks, covering elementary... -
A Digital Dictionary of Tunis Arabic - TUNICO (ELEXIS)
A corpus-based dictionary, enriched with historical data. The dictionary was not only built on data from the corpus of spoken language that was compiled in the same project, but... -
Diachrono
Polish texts from 17th to 19th century -
Diachrono - sample
Sample of diachronic corpus -
diachronic1
HISTORY
