-
Eesti keele segakorpus: Seadused Corpus of Estonian law texts
Eesti ja Euroopa seadusetekstide korpus. TEI P5 XML märgendus, UTF8 kodeering. More info at http://www.cl.ut.ee/korpused/segakorpus/seadused/ Corpus of law texts in Estonian,... -
Morfoloogiliselt ühestatud korpus Corpus of morphologically disambiguated Es...
Käsitis morfoloogiliselt ühestatud korpus More info at http://www.cl.ut.ee/korpused/morfkorpus/index.php?lang=en Manually annotated corpus. Available for download and via Korp... -
LitLat BERT
Trilingual BERT-like (Bidirectional Encoder Representations from Transformers) model, trained on Lithuanian, Latvian, and English data. State of the art tool representing... -
Lithuanian morphologically annotated corpus - MATAS v3.0
MATAS corpus (version 3.0) DESCRIPTION Updated, manually checked, morphologically annotated corpus MATAS LANGUAGE Lithuanian PREVIOUS VERSIONS 1. MATAS v0.2... -
English-French-Lithuanian Parallel Corpus of EU Financial Documents
The corpus is comprised of 154 EU legislative documents (English documents and their translations into French and Lithuanian) related to various financial issues and enacted in... -
Wordlist of Lemmas from the Joint Corpus of Lithuanian
The resource is a wordlist of lemmas from the Joint Corpus of Lithuanian (JCL). The JCL is a merge of three corpora: 1) Vilnius university corpus compiled out of the Lithuanian... -
English-Lithuanian Comparable Cybersecurity Corpus - DVITAS
The English-Lithuanian comparable corpus (DVITAS COMPARABLE) is morphologically annotated. It includes English and Lithuanian original texts on cybersecurity from the time... -
Lithuanian 4-gram dataset
Dataset of 4-grams with frequencies extracted from Delfi.lt corpus (~ 70 million words, period: March 2014 - November 2016). Firstly corpus was split into sentences, then symbol... -
Corpus KLASIUS v.02
900 extracts for the corpus were collected from manuals and publications for secondary school students included in the compulsory bibliographic descriptions of the university... -
Lithuanian 2-gram dataset
Dataset of 2-grams with frequencies extracted from Delfi.lt corpus (~ 70 million words, period: March 2014 - November 2016). Firstly corpus was split into sentences, then symbol... -
Lithuanian Treebank ALKSNIS
ALKSNIS v2.1 ALKSNIS v2.1 consists of 2,355 syntactically annotated sentences in the PML (Prague Mark-up Language) format. The format allows researchers to visualise and edit... -
Lithuanian Parliament Corpus for Authorship Attribution
23.9 m word Lithuanian Parliament corpus is specially designed for authorship attribution task. The corpus consists of 111 thousand samples of speech transcripts by 147... -
Lithuanian Spelling Checker V.1.0.45 for LibreOffice and OpenOffice
Lithuanian spelling checker for LIBREOFFICE / OPENOFFICE 2020-04-09 version 1.0.45 -
Lithuanian Spelling Checker V.1.0.45 for Linux
Lithuanian spelling checker for Linux 2020-04-07 version 1.0.45 -
Wordlist of the Contemporary Corpus of Lithuanian language
Dabartinės lietuvių kalbos tekstyno žodžių formų dažniniai sąrašai Worlists of Wordforms of the Contemporary Corpus of Lithuanian language Tekstyno struktūra/Corpus Structure... -
LITIS v.1
Corpus of user-generated comments collected from two Lithuanian portals: www.delfi.lt and www.lrytas.lt Each comment is in a separate file (TXT). Each file contains: a comment,... -
Frequency lists of pivot words and GSE counts
The resource contains data used to estimate the amount of words in Lithuanian texts indexed by the selected Global Search Engines (GSE), namely Google (by Alphabet Inc.), Bing... -
Lithuanian Corpus of the EU Primary and Secondary Law Acts of the Period 2015...
274,460 word corpus comprised of selected primary and secondary law acts of the EU of the period 2015-2017. The corpus was compiled of documents containing words with the root... -
Database of Lithuanian Multiword Expressions
Database of Lithuanian multiword expressions (MWE) contains bi-gram and tri-gram MWE that occured in DELFI.lt corpus (http://tekstynas.mwe.lt/) at least 10 times. In the... -
Dual Pronoun Translation Concordances
The resource offers two data sets: concordances of dual pronoun translations from Lithuanian into English (942 concordance lines) and translations of English pronouns into...
