Dataset - B2FIND

Eesti keele ühendkorpus 2023 (annoteerimata) Estonian National Corpus 2023 (...

Estonian corpus of written texts. Consists of the Estonian Reference Corpus (90s–2008), Contemporary and old literature, Estonian Web (2013, 2017, 2019, 2021, 2023), Timestamped...

Eesti ajakirjanduse korpus Corpus of Estonian newspaper texts

Korpus sisaldab eesti ajalehti, 182 miljonit sõna. TEI P5 XML märgendus, UTF8 kodeering. More info at http://www.cl.ut.ee/korpused/ Corpus of Estonian newspaper texts, 182...

Eesti murdekorpus Estonian Dialect Corpus

korpus More info at http://www.murre.ut.ee/estonian-dialect-corpus/ The dialect corpus consists of: 1) Dialect recordings. The corpus is based on dialect recordings which...

Eesti emotsionaalse kõne korpus Estonian Emotional Speech Corpus

Korpus sisaldab 1234 eestikeelset viha-, rõõmu- ja kurbuse emotsiooniga lauset ning neutraalset lauset. Naishääl, 44.1 KHz, 16Bit, Mono; wav, textgrid:...

Segakorpus: Riigikogu Corpus of the Proceedings of Estonian Parliament

Riigikogu korpus. TEI P5 XML märgendus, UTF8 kodeering. More info at http://www.cl.ut.ee/korpused/segakorpus/riigikogu/index.php?lang=et Corpus of the Proceedings of Estonian...

Morphological analyzer for Estonian ESTMORF

ESTMORF is a computer program for analysing unrestricted Estonian text. ESTMORF is implemented in a most straightforward way: it compares word forms of the running text with...

Eesti puudepanga korpus Estonian Treebank

Estonian Treebank is available both in the VISL and TigerXML format. Esttre consists of ca 1400 manually annotated sentences (10600 tokens), the text classes represented in the...

Vana kirjakeele korpus Corpus of Old Written Estonian

The Corpus is geared towards researchers of the history and development of written Estonian. The texts included are from 16.-18. century. From 16th century all known printed and...

Suulise keele korpus Corpus of Spoken Estonian

The Department of Estonian Language initiated the corpus of spoken Estonian in 1997. The corpus is compiled by the research group of Spoken Estonian (Tiit Hennoste, Airi...

Eesti-inglise paralleelkorpus Estonian-English parallel corpus

korpus More info at http://www.cl.ut.ee/korpused/paralleel/index.php?lang=en Annotated and sentence-aligned parallel text corpus; contains: 1. Estonian laws and their...

Estonian Wordnet (kb69a)

The atom of a wordnet-type thesaurus is a synonym set (also called a synset), which is a set containing all the synonymous words or multi-word units that express the same...

Sagedussõnastik Estonian Frequency Dictionary

Sagedusloendid, mis on tehtud 0,5 miljoni sõnaga ilukirjanduse korpuse baasil (aastatest 1992-1998) ja 0,5 miljoni sõnaga ajakirjanduse korpuse baasil (1995-1999). Kolm...

Nimeüksuste korpus Estonian NER corpus

Corpus containing morphologically analyzed articles with named entity annotations (persons, organizations, locations) in BOI format.

Eesti Keele Instituudi reeglipõhise morfoloogia tööriistad Tools of the IEL ...

Eesti Keele Instituudi reeglipõhine morfoloogiatööriistade komplekt sisaldab endas eraldi kasutatavaid mooduleid silbitamise, tüübituvastuse, morfoloogilise analüüsi ja sünteesi...

Eesti ilukirjanduse korpus Corpus of Estonian fiction

Eesti ilukirjanduse korpus alates 1990. Kokku 5,6 miljonit sõna. More info at http://www.cl.ut.ee/korpused/segakorpus/eesti_ilukirjandus_1990 A text corpus containing Estonian...

Morfoloogiliselt ühestatud korpus Corpus of morphologically disambiguated Es...

Käsitis morfoloogiliselt ühestatud korpus More info at http://www.cl.ut.ee/korpused/morfkorpus/index.php?lang=en Manually annotated corpus. Available for download and via Korp...

Eesti-läti ehitusalane paralleelkorpus Estonian-Latvian Parallel Corpus of b...

korpus Parallel corpus of the info texts of building foams and sealants in Latvian and Estonian.

Segakorpus: Doktoritööd Corpus of Estonian scientific texts

Korpus sisaldab 5 miljonit sõna eestikeelset teaduskirjandust: doktoritööd (2,3 miljonit sõna) ja teadusartiklid. TEI P5 XML märgendus, UTF8 kodeering. More info at...

Eesti avatud paralleelkorpus Estonian Open Parallel Corpus

Projekti „Eesti avatud paralleelkorpus” eesmärk on luua oluline kogus keeleressursse statistiliste masintõlkesüsteemide parendamiseks. Projekt aitab kaasa olukorra saavutamisele...

Eesti keele spontaanse kõne foneetiline korpus v.1.0.0 Phonetic Corpus of Es...

The aim of the corpus is to compile a large amount of quality recordings of spontaneous Estonian and segment it phonetically on different levels. The project started in autumn...

53 datasets found