-
Corpus of conversational humor Krohot 1.0
The KROHOT corpus consists of 10 audio recordings of private, spontaneous conversations between two or three speakers, with a total duration of 232 minutes. Most recordings were... -
CMC training corpus Janes-Tag 2.1
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation, sentence... -
CMC training corpus Janes-Norm 1.2
Janes-Norm is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation,... -
Frequency List of Lithuanian Homoforms
The list contains 63,139 homoforms. In the Frequency List of Lithuanian Homoforms, the following data are provided for each homoform: 1. the homoform itself, 2) its lemma (or... -
Slovene-Japanese Learner's Dictionary sloJa 1.1
The Slovenian-Japanese online dictionary for Slovenian speaking learners of Japanese was compiled by extracting and converting the Japanese-Slovenian dictionary jaSlo 3.1... -
Ontology of topics for Slovenian as a second and foreign language ONTEM 1.0
ONTEM 1.0 comprises 1,019 manually prepared entries, each consisting of information about the lemma, part-of-speech (following the MULTEXT-East tagset for Slovenian,... -
Slovene learner corpus KOST 2.1
The corpus of Slovene as a foreign language KOST (Korpus slovenščine kot tujega jezika) contains 10,590 texts (almost 1.4 million words) written by adult speakers for whom... -
Comparable corpus of parliamentary debates ParlaMint-ES-CN 1.0
The ParlaMint-ES-CN corpus is the contribution of the Parliament of the Canary Islands (Parlamento de Canarias) to the ParlaMint collection of comparable parliamentary corpora... -
Monitor corpus of Slovene Trendi 2025-09
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 58 publishers. Trendi 2025-09 covers the period from January... -
Slovene-Japanese Learner's Dictionary sloJa 1.0
The Slovenian-Japanese online dictionary for Slovenian speaking learners of Japanese was compiled by extracting and converting the Japanese-Slovenian dictionary jaSlo 3.1... -
Slovene learner corpus KOST 2.0
The corpus of Slovene as a foreign language KOST (Korpus slovenščine kot tujega jezika) contains 8,347 texts (almost 1.3 million words) written by adult speakers for whom... -
Corpus of scientific texts of contemporary Slovenian KZB 1.0
The Corpus of scientific texts of contemporary Slovenian consists of 25 million words from scientific monographs and scientific papers written mainly between 2000 and 2023. It... -
Multilingual dataset of COVID tweets for relation-level metaphor analysis TCM...
TCMeta is a dataset of noun phrase constructions from COVID-related tweets, annotated for relation-level metaphor. It contains 2,138 Slovene and 2,221 English instances in... -
Slovene learner corpus KOST 1.0
The corpus of Slovene as a foreign language KOST (Korpus slovenščine kot tujega jezika) contains 6,311 texts (just over 1 million words) written by adult speakers for whom... -
LOCOLE (Longitudinal Corpus of Learner English)
Information about LOCOLE This corpus comprises essays written by university students of English Philology over the course of one academic year. The essays were collected four... -
Dataset of annotated headword-synonym-distractor triplets SYNDIST
The dataset contains 51,023 headword-synonym-distractor triplets for 5,000 headwords. Distractor is defined as an incorrect answer/alternative to synonym, which can be similar... -
Corpus of Transcriptions - part 2
The second part of the Corpus of Transcriptions contains phonemic transcriptions of a short passage from Lecumberri and Maidment (2000, p. 78) performed by the undergraduate... -
Corpus of Transcriptions - part 1
The first part of the Corpus of Transcriptions contains phonemic transcriptions of a short passage from Lecumberri and Maidment (2000, p. 78) performed by the undergraduate... -
Eesti-inglise paralleelkorpus Estonian-English parallel corpus
Eesti-inglise paralleelkorpus. More info at http://www.cl.ut.ee/korpused/paralleel/index.php?lang=en Annotated and sentence-aligned parallel text corpus; contains: 1. Estonian... -
EKI veamärgendatud E2 õppijakorpus (versioon 2) EKI error-annotated Estonian...
Veamärgendatud korpuse materjalid põhinevad EMMA õppijakeelekorpusel, sisaldades andmeid Haridus- ja Noorteameti tasemetöödest (7. klass, 504 teksti), põhikooli lõpueksamitest...
