-
Monolingual Mining Corpus - RudKorP (ELEXIS)
RudKorP - Rudarski javno dostupan korpus - Serbian Public Mining Corpus, specialized corpus in the field of mining and mineral resource exploitation, containing research papers,... -
Concordance of Trubar's Gospel of St. Matthew (1555) (ELEXIS)
Konkordance Trubarjevega Evangelija sv. Matevža (1555). The 23603 concordances represent a transcription of the book "Ta evangeli sv. Matevža" (1555) by Primož Trubar. See also:... -
Slovene text simplification dataset SloTS
To increase the accessibility and diversity of easy reading in Slovenian and to create a prototype system that automatically simplifies texts in Slovenian, we prepared a dataset... -
Eesti keele spontaanse kõne foneetiline korpus v.1.0.0 Phonetic Corpus of Es...
The aim of the corpus is to compile a large amount of quality recordings of spontaneous Estonian and segment it phonetically on different levels. The project started in autumn... -
Eesti keele segakorpus: Seadused Corpus of Estonian law texts
Eesti ja Euroopa seadusetekstide korpus. TEI P5 XML märgendus, UTF8 kodeering. More info at http://www.cl.ut.ee/korpused/segakorpus/seadused/ Corpus of law texts in Estonian,... -
Segakorpus: Doktoritööd Corpus of Estonian scientific texts
Korpus sisaldab 5 miljonit sõna eestikeelset teaduskirjandust: doktoritööd (2,3 miljonit sõna) ja teadusartiklid. TEI P5 XML märgendus, UTF8 kodeering. More info at... -
Pindsüntaktiliselt analüüsitud korpus Estonian corpus with shallow syntactic...
This corpus is a monolingual corpus with Constraint Grammar-style shallow syntactic annotations. -
Morfoloogiliselt ühestatud korpus Corpus of morphologically disambiguated Es...
Käsitis morfoloogiliselt ühestatud korpus More info at http://www.cl.ut.ee/korpused/morfkorpus/index.php?lang=en Manually annotated corpus. Available for download and via Korp... -
Vana kirjakeele korpus Corpus of Old Written Estonian
The Corpus is geared towards researchers of the history and development of written Estonian. The texts included are from 16.-18. century. From 16th century all known printed and... -
Suulise keele korpus Corpus of Spoken Estonian
The Department of Estonian Language initiated the corpus of spoken Estonian in 1997. The corpus is compiled by the research group of Spoken Estonian (Tiit Hennoste, Airi... -
Nimeüksuste korpus Estonian NER corpus
Corpus containing morphologically analyzed articles with named entity annotations (persons, organizations, locations) in BOI format. -
Eesti ilukirjanduse korpus Corpus of Estonian fiction
Eesti ilukirjanduse korpus alates 1990. Kokku 5,6 miljonit sõna. More info at http://www.cl.ut.ee/korpused/segakorpus/eesti_ilukirjandus_1990 A text corpus containing Estonian... -
The IPI PAN Corpus
written, general, monolingual, synchronic; 250 million; XML (XCES), morphosyntactic, structural, metada -
Szeged Corpus 1.0
written, monolingual, general, manually POS annotated reference corpus; 1,247,546 tokens; MSD tagset, XML (TEIxLite) files -
Szeged Corpus 2.0
written, monolingual, general, manually POS annotated reference corpus; 1,459,288 tokens; MSD tagset, XML (TEI P4) files -
CoCzeFLA Chroma 2022.07
Transcripts of longitudinal audio recordings of 7 Czech typical monolingual children between 1;7 to 3;9. Files are in plain text with UTF-8 encoding. Each file represents one... -
Eesti puudepanga korpus Estonian Treebank
Estonian Treebank is available both in the VISL and TigerXML format. Esttre consists of ca 1400 manually annotated sentences (10600 tokens), the text classes represented in the... -
Segakorpus: Riigikogu Corpus of the Proceedings of Estonian Parliament
Riigikogu korpus. TEI P5 XML märgendus, UTF8 kodeering. More info at http://www.cl.ut.ee/korpused/segakorpus/riigikogu/index.php?lang=et Corpus of the Proceedings of Estonian... -
Eesti murdekorpus Estonian Dialect Corpus
korpus More info at http://www.murre.ut.ee/estonian-dialect-corpus/ The dialect corpus consists of: 1) Dialect recordings. The corpus is based on dialect recordings which... -
Eesti ajakirjanduse korpus Corpus of Estonian newspaper texts
Korpus sisaldab eesti ajalehti, 182 miljonit sõna. TEI P5 XML märgendus, UTF8 kodeering. More info at http://www.cl.ut.ee/korpused/ Corpus of Estonian newspaper texts, 182...