-
Segakorpus: Doktoritööd Corpus of Estonian scientific texts
Korpus sisaldab 5 miljonit sõna eestikeelset teaduskirjandust: doktoritööd (2,3 miljonit sõna) ja teadusartiklid. TEI P5 XML märgendus, UTF8 kodeering. More info at... -
Aligned Estonian-Icelandic ICD-10
Aligned Estonian and Icelandic versions of WHO-s International Classification of Diseases (ICD-10) -
Eesti avatud paralleelkorpus Estonian Open Parallel Corpus
Projekti „Eesti avatud paralleelkorpus” eesmärk on luua oluline kogus keeleressursse statistiliste masintõlkesüsteemide parendamiseks. Projekt aitab kaasa olukorra saavutamisele... -
Vana kirjakeele korpus Corpus of Old Written Estonian
The Corpus is geared towards researchers of the history and development of written Estonian. The texts included are from 16.-18. century. From 16th century all known printed and... -
Morphological analyzer for Estonian ESTMORF
ESTMORF is a computer program for analysing unrestricted Estonian text. ESTMORF is implemented in a most straightforward way: it compares word forms of the running text with... -
Eesti ajakirjanduse korpus Corpus of Estonian newspaper texts
Korpus sisaldab eesti ajalehti, 182 miljonit sõna. TEI P5 XML märgendus, UTF8 kodeering. More info at http://www.cl.ut.ee/korpused/ Corpus of Estonian newspaper texts, 182... -
Sagedussõnastik Estonian Frequency Dictionary
Sagedusloendid, mis on tehtud 0,5 miljoni sõnaga ilukirjanduse korpuse baasil (aastatest 1992-1998) ja 0,5 miljoni sõnaga ajakirjanduse korpuse baasil (1995-1999). Kolm... -
Estonian Wordnet (kb69a)
The atom of a wordnet-type thesaurus is a synonym set (also called a synset), which is a set containing all the synonymous words or multi-word units that express the same... -
Pindsüntaktiliselt analüüsitud korpus Estonian corpus with shallow syntactic...
This corpus is a monolingual corpus with Constraint Grammar-style shallow syntactic annotations. -
Estonian WordNet (kb65a-4)
Compiled manually according to EuroWordNet project. More info at http://www.cl.ut.ee/ressursid/teksaurus -
Wordlist of the Contemporary Corpus of Lithuanian Language in the Face of War...
We present the comparative wordlist based on the Corpus of the Contemporary Lithuanian Language (CCLL2 version 2, pre-2020), supplemented by the media (courtesy of the news... -
Eesti keele segakorpus: Seadused Corpus of Estonian law texts
Eesti ja Euroopa seadusetekstide korpus. TEI P5 XML märgendus, UTF8 kodeering. More info at http://www.cl.ut.ee/korpused/segakorpus/seadused/ Corpus of law texts in Estonian,... -
Morfoloogiliselt ühestatud korpus Corpus of morphologically disambiguated Es...
Käsitis morfoloogiliselt ühestatud korpus More info at http://www.cl.ut.ee/korpused/morfkorpus/index.php?lang=en Manually annotated corpus. Available for download and via Korp... -
Eesti Keele Instituudi reeglipõhise morfoloogia tööriistad Tools of the IEL ...
Eesti Keele Instituudi reeglipõhine morfoloogiatööriistade komplekt sisaldab endas eraldi kasutatavaid mooduleid silbitamise, tüübituvastuse, morfoloogilise analüüsi ja sünteesi... -
LitLat BERT
Trilingual BERT-like (Bidirectional Encoder Representations from Transformers) model, trained on Lithuanian, Latvian, and English data. State of the art tool representing... -
Lithuanian morphologically annotated corpus - MATAS v3.0
MATAS corpus (version 3.0) DESCRIPTION Updated, manually checked, morphologically annotated corpus MATAS LANGUAGE Lithuanian PREVIOUS VERSIONS 1. MATAS v0.2... -
English-French-Lithuanian Parallel Corpus of EU Financial Documents
The corpus is comprised of 154 EU legislative documents (English documents and their translations into French and Lithuanian) related to various financial issues and enacted in... -
Wordlist of Lemmas from the Joint Corpus of Lithuanian
The resource is a wordlist of lemmas from the Joint Corpus of Lithuanian (JCL). The JCL is a merge of three corpora: 1) Vilnius university corpus compiled out of the Lithuanian... -
English-Lithuanian Comparable Cybersecurity Corpus - DVITAS
The English-Lithuanian comparable corpus (DVITAS COMPARABLE) is morphologically annotated. It includes English and Lithuanian original texts on cybersecurity from the time... -
Lithuanian 4-gram dataset
Dataset of 4-grams with frequencies extracted from Delfi.lt corpus (~ 70 million words, period: March 2014 - November 2016). Firstly corpus was split into sentences, then symbol...
