-
Lithuanian font family AISTIKA
Original TrueType font designed and hinted in Lithuania. The font complies with the ISO/IEC 10646 (Unicode) standard and have the full set of casual and accented Lithuanian... -
Lithuanian 3-gram dataset
Dataset of 3-grams with frequencies extracted from Delfi.lt corpus (~ 70 million words, period: March 2014 - November 2016). Firstly corpus was split into sentences, then symbol... -
DIGIRES COVID-19 Corpus v.1
DIGIRES COVID-19 Corpus v.1 consists of 351 Lithuanian media articles about COVID-19 pandemics. The corpus was compiled from various internet public Lithuanian media sources.... -
Lithuanian morphologically annotated corpus - MATAS v1.0
MATAS corpus (version 1.0) DESCRIPTION Manually checked, morphologically annotated corpus MATAS FORMATS 1. CoNLL-U (CONLLU, conllu) 2. SketchEngine - tab delimited word per... -
Lithuanian 1-gram dataset
Dataset of 1-grams with frequencies extracted from Delfi.lt corpus (~ 70 million words, period: March 2014 - November 2016). Firstly corpus was split into sentences, then... -
Lithuanian Spelling Checker V.1.0.45 for macOS
Lithuanian spelling checker for macOS 2020-04-10 version 1.0.45 -
English-Lithuanian Comparable Vaccination Corpus
Two news portals were selected for comparable corpora building: the Lithuanian portal DELFI and the English portal The Guardian. The compiled corpora comprise 135 Lithuanian... -
Lithuanian Coreference Corpus
Lithuanian Coreference Corpus The corpus is made out of 100 articles from news portals focusing on political news, as such texts are rich in quotations and named entity... -
DELFI.lt corpus
DELFI.lt is corpus made of articles published by news portal DELFI.lt since March 2014 till November 2016. Metadata was collected with articles as well: author, title, date,... -
Lithuanian-English Cybersecurity Termbase v.0.1
The bilingual termbase is TBX export of the online termbase https://www.terminologue.org/csterms/. The termbase includes terms for 233 cybersecurity concepts. -
EMVAKA
Two Lithuanian language children’s corpora, collected during the EMVAKA project, consist of the Lithuanian language production by children aged 7–13: (1) spoken (73 files, c.... -
Assessment Data of the Dictionary of Modern Lithuanian versus Joint Corpora
The resource is the assessment data of The Dictionary of Modern Lithuanian, 6th edition (DML6) [1], from the point of view of its coverage in the Joint Corpus of Lithuanian... -
Lithuanian Word embeddings
GloVe type word vectors (embeddings) for Lithuanian. Delfi.lt corpus (~70 million words) and StanfordNLP were used for training. The training consisted of several stages: 1)... -
The Database of Lithuanian multiword expressions
The Database of Lithuanian multiword expressions (MWEs) is freely accessible for online search at: https://resursai.pastovu.vdu.lt/paieska/paprastoji from 2019. It contains... -
Corpus of Discourse on Crime
Specialised "Corpus of Discourse on Crime" is synchronic, monolingual, unannotated, consists of two subcorpora. Subcorpus 1: all texts on crime, published in criminal columns on... -
The Scottish Gaelic Linguistic Toolkit
A linguistic analyser for tagging, lemmatisation and parsing of Scottish Gaelic texts. Morphological and syntactic analyses are available directly from the webpage (through the... -
Read Speech Corpus (7G)
The corpus of read Lithuanian speech „7G“ was compiled in 2015-2016. The corpus consists of 352 audio recordings with a total duration of over 7 hours. Seven different speakers... -
English-Lithuanian Parallel Cybersecurity Corpus - DVITAS
English-Lithuanian parallel corpus DVITAS includes original English texts on cybersecurity and their Lithuanian translations aligned on the sentence level. The corpus was... -
Lithuanian speech-to-text Transcriber
Speech to text automatic transcriber for Lithuanian is a containerized application implemented into 17 containers. It covers four areas: administrative, legal, medical and... -
Lemmatised Wordlist of 1 m. Corpus of Contemporary Lithuanian
The lemmatised wordlist of 1 m. word Lithuanian corpus. The structure of the tab delimited text file (dazninis.txt): HeadwordPart of SpeechWordformFrequency of Occurrence. The...
