-
Lithuanian Coreference Corpus
Lithuanian Coreference Corpus The corpus is made out of 100 articles from news portals focusing on political news, as such texts are rich in quotations and named entity... -
DELFI.lt corpus
DELFI.lt is corpus made of articles published by news portal DELFI.lt since March 2014 till November 2016. Metadata was collected with articles as well: author, title, date,... -
EMVAKA
Two Lithuanian language children’s corpora, collected during the EMVAKA project, consist of the Lithuanian language production by children aged 7–13: (1) spoken (73 files, c.... -
Corpus of Discourse on Crime
Specialised "Corpus of Discourse on Crime" is synchronic, monolingual, unannotated, consists of two subcorpora. Subcorpus 1: all texts on crime, published in criminal columns on... -
Read Speech Corpus (7G)
The corpus of read Lithuanian speech „7G“ was compiled in 2015-2016. The corpus consists of 352 audio recordings with a total duration of over 7 hours. Seven different speakers... -
English-Lithuanian Parallel Cybersecurity Corpus - DVITAS
English-Lithuanian parallel corpus DVITAS includes original English texts on cybersecurity and their Lithuanian translations aligned on the sentence level. The corpus was... -
English-Lithuanian Parallel Cybersecurity Corpus - DVITAS v2.0
English-Lithuanian parallel corpus DVITAS v2 includes original English texts on cybersecurity and their Lithuanian translations aligned on the sentence level. Version 1 of the... -
ORVELIT v3
ORVELIT v3 (Lith.Originalios ir Vertimų Lietuvių Kalbos Tekstynas) is a comparable monolingual corpus of original and translated Lithuanian consisting of four sub-corpora of... -
Corpus of the Contemporary Lithuanian Language
Corpus of the Contemporary Lithuanian Language, which comprises 208 million words, is a collection of texts designed to represent the current Lithuanian. The corpus has been... -
Lithuanian morphologically annotated corpus - MATAS
MATAS v0.2 - Morphologically Annotated Lithuanian Corpus (manually checked) Contains 4 parts: Documents (21%), Fiction (19%), Periodicals (36%), Scientific texts (24%) Wordform... -
Lithuanian Treebank ALKSNIS (2019-10-24)
ALKSNIS v3.0. ALKSNIS v3,0 consists of 3,643 syntactically annotated sentences in the PML (Prague Mark-up Language) format. The format allows researchers to visualise and edit... -
TED-ELH Parallel Corpus
The corpus contains parallelly aligned scripts of TED Talks in English, Lithuanian, and Hebrew. It contains spoken language data. -
Gustav Vasa's letter production (2015-05-26) Gustav Vasas brevproduktion (20...
King Gustav I's registry Konung Gustaf den förstes registratur -
HELLO CAMPANIA! Philippines Collection
The Philippines collection contains data for 66 speakers: 32 first generation (G1), 28 second generation (G2), 6 homeland (G0). The collection contains three folders for each... -
HELLO CAMPANIA! Bangladesh Collection
The collection contains 11 interviews with 1st Bangladeshi generation migrants in Naples. It also contains langauge portraits of the migrants. -
HELLO CAMPANIA! Ukraina Collection
The Ukrainian collection contains data for 26 speakers of first generation (G1), 19 females and 6 males. The collection contains three folders for each group: the... -
Concerto di Caterina Bueno - CB-CONC-042-01
Concerto di Caterina Bueno-CB-CONC-042-01 -
Registrazione di un incontro con più testimoni - CB-RIC-002-01
Registrazione di un incontro con più testimoni - CB-RIC-002-01 -
Prove (voce maschile e chitarra) - CB-PROV-189-01
Prove (voce maschile e chitarra) - CB-PROV-189-01 -
Monitor corpus of Slovene Trendi 2024-10
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 76 publishers. Trendi 2024-10 covers the period from January...
