-
DELFI.lt corpus
DELFI.lt is corpus made of articles published by news portal DELFI.lt since March 2014 till November 2016. Metadata was collected with articles as well: author, title, date,... -
Lithuanian-English Cybersecurity Termbase v.0.1
The bilingual termbase is TBX export of the online termbase https://www.terminologue.org/csterms/. The termbase includes terms for 233 cybersecurity concepts. -
EMVAKA
Two Lithuanian language children’s corpora, collected during the EMVAKA project, consist of the Lithuanian language production by children aged 7–13: (1) spoken (73 files, c.... -
Assessment Data of the Dictionary of Modern Lithuanian versus Joint Corpora
The resource is the assessment data of The Dictionary of Modern Lithuanian, 6th edition (DML6) [1], from the point of view of its coverage in the Joint Corpus of Lithuanian... -
Lithuanian Word embeddings
GloVe type word vectors (embeddings) for Lithuanian. Delfi.lt corpus (~70 million words) and StanfordNLP were used for training. The training consisted of several stages: 1)... -
The Database of Lithuanian multiword expressions
The Database of Lithuanian multiword expressions (MWEs) is freely accessible for online search at: https://resursai.pastovu.vdu.lt/paieska/paprastoji from 2019. It contains... -
Corpus of Discourse on Crime
Specialised "Corpus of Discourse on Crime" is synchronic, monolingual, unannotated, consists of two subcorpora. Subcorpus 1: all texts on crime, published in criminal columns on... -
The Scottish Gaelic Linguistic Toolkit
A linguistic analyser for tagging, lemmatisation and parsing of Scottish Gaelic texts. Morphological and syntactic analyses are available directly from the webpage (through the... -
Read Speech Corpus (7G)
The corpus of read Lithuanian speech „7G“ was compiled in 2015-2016. The corpus consists of 352 audio recordings with a total duration of over 7 hours. Seven different speakers... -
English-Lithuanian Parallel Cybersecurity Corpus - DVITAS
English-Lithuanian parallel corpus DVITAS includes original English texts on cybersecurity and their Lithuanian translations aligned on the sentence level. The corpus was... -
Lithuanian speech-to-text Transcriber
Speech to text automatic transcriber for Lithuanian is a containerized application implemented into 17 containers. It covers four areas: administrative, legal, medical and... -
Lemmatised Wordlist of 1 m. Corpus of Contemporary Lithuanian
The lemmatised wordlist of 1 m. word Lithuanian corpus. The structure of the tab delimited text file (dazninis.txt): HeadwordPart of SpeechWordformFrequency of Occurrence. The... -
English-Lithuanian Parallel Cybersecurity Corpus - DVITAS v2.0
English-Lithuanian parallel corpus DVITAS v2 includes original English texts on cybersecurity and their Lithuanian translations aligned on the sentence level. Version 1 of the... -
DIGIRES COVID-19 ML Dataset v.1
DIGIRES COVID-19 ML dataset v.1 is a tab-separated (.tsv) file prepared for training machine learning algorithms. The training dataset was compiled from various internet public... -
Lithuanian keyboard for macOS users
This keyboard driver allows easy access of the Lithuanian letters via conventional keyboard layout a.k.a. „Lithuanian letters instead of numbers“. Essential new feature of this... -
Colloc -- A Tool for Automatic Identification of Multiword Expressions
Colloc -- a tool for automatic identification of multiword expressions (MWE) is freely available for online use at http://resursai.mwe.lt/atpazintuvas. As material for training... -
ORVELIT v3
ORVELIT v3 (Lith.Originalios ir Vertimų Lietuvių Kalbos Tekstynas) is a comparable monolingual corpus of original and translated Lithuanian consisting of four sub-corpora of... -
Corpus of the Contemporary Lithuanian Language
Corpus of the Contemporary Lithuanian Language, which comprises 208 million words, is a collection of texts designed to represent the current Lithuanian. The corpus has been... -
Pedagogic Corpus of Lithuanian
The Pedagogic Corpus of Lithuanian is a monolingual specialized corpus, prepared for learning and teaching Lithuanian in a foreign language classroom. The pedagogic corpus... -
Lithuanian morphologically annotated corpus - MATAS
MATAS v0.2 - Morphologically Annotated Lithuanian Corpus (manually checked) Contains 4 parts: Documents (21%), Fiction (19%), Periodicals (36%), Scientific texts (24%) Wordform...
