-
English-Lithuanian Parallel Cybersecurity Corpus - DVITAS v2.0
English-Lithuanian parallel corpus DVITAS v2 includes original English texts on cybersecurity and their Lithuanian translations aligned on the sentence level. Version 1 of the... -
DIGIRES COVID-19 ML Dataset v.1
DIGIRES COVID-19 ML dataset v.1 is a tab-separated (.tsv) file prepared for training machine learning algorithms. The training dataset was compiled from various internet public... -
Lithuanian keyboard for macOS users
This keyboard driver allows easy access of the Lithuanian letters via conventional keyboard layout a.k.a. „Lithuanian letters instead of numbers“. Essential new feature of this... -
Colloc -- A Tool for Automatic Identification of Multiword Expressions
Colloc -- a tool for automatic identification of multiword expressions (MWE) is freely available for online use at http://resursai.mwe.lt/atpazintuvas. As material for training... -
ORVELIT v3
ORVELIT v3 (Lith.Originalios ir Vertimų Lietuvių Kalbos Tekstynas) is a comparable monolingual corpus of original and translated Lithuanian consisting of four sub-corpora of... -
Corpus of the Contemporary Lithuanian Language
Corpus of the Contemporary Lithuanian Language, which comprises 208 million words, is a collection of texts designed to represent the current Lithuanian. The corpus has been... -
Pedagogic Corpus of Lithuanian
The Pedagogic Corpus of Lithuanian is a monolingual specialized corpus, prepared for learning and teaching Lithuanian in a foreign language classroom. The pedagogic corpus... -
Lithuanian morphologically annotated corpus - MATAS
MATAS v0.2 - Morphologically Annotated Lithuanian Corpus (manually checked) Contains 4 parts: Documents (21%), Fiction (19%), Periodicals (36%), Scientific texts (24%) Wordform... -
Language Technology Research Bibliography for Lithuanian 2016-2020
The language technology bibliography for Lithuanian language in the period 2016-2020. The resource is in BibTex format and it contains: 1) 91 references of research... -
JABLONSKIS tagset v2
JABLONSKIS VERSION 2 is a Lithuanian standard morphologiclal tagset that is based on the abbreviations of parts of speech and other grammatical categories commonly used in... -
Lithuanian Treebank ALKSNIS (2019-10-24)
ALKSNIS v3.0. ALKSNIS v3,0 consists of 3,643 syntactically annotated sentences in the PML (Prague Mark-up Language) format. The format allows researchers to visualise and edit... -
TED-ELH Parallel Corpus
The corpus contains parallelly aligned scripts of TED Talks in English, Lithuanian, and Hebrew. It contains spoken language data. -
Survey Data on Preferences of Lithuanian Cybersecurity Terminology
The data is provided in two files: one containing questionnaire-data and the other containing the respondentents' data. The questionnaire data is in a TXT file, which includes... -
MariTerm v.1.2
This is an enriched version of the MariTerm maritime ontology, containing plug-ins to correpsonding synsets inside IWN. The resource was created within the collaboration of the... -
SELEXINI corpus
We present here a large automatically annotated corpus for French. This corpus is divided into two parts: the first from BigScience, and the second from HPLT. The annotated... -
Parole+ (2017-10-16)
The Swedish PAROLE Lexicon - A language technology resource with access to syntactic information, connected to SALDO senses. Svenskt PAROLE-lexikon - En språkteknologisk resurs... -
Annotated Route Description
This file set existing of a video stream, an audio stream and a multimodal annotation file is a frequently used as show case of how to do complex multimodal annotations with the... -
Model for Normalizing Historical English
This is an OpenNMT-py model for normalizing historical English into modern spelling. For usage, please see: https://github.com/mikahama/natas This has been described in the... -
Gustav Vasa's letter production (2015-05-26) Gustav Vasas brevproduktion (20...
King Gustav I's registry Konung Gustaf den förstes registratur -
Wikipedia paths
Wikipedia category embedding starting at the top category Biology for English, French and Czech. English data are not complete.
