-
Beserman multimedia corpus
Beserman multimedia corpus This deposit contains transcriptions of monologues and conversations in spoken Beserman (formerly classified as a dialect of Udmurt, ISO 639-2 code... -
Beserman multimedia corpus (sound/video)
Beserman multimedia corpus This deposit contains monologues and conversations in spoken Beserman (formerly classified as a dialect of Udmurt, ISO 639-2 code udm). It contains... -
Spoken corpus of Udmurt dialects
This deposit contains transcriptions of oral interviews and conversations in various dialects of Udmurt (Permic < Uralic; ISO 639-2 code udm). It contains 25 recordings with... -
Prague Dependency Treebank - Consolidated 2.0 (PDT-C 2.0)
A manually annotated and genre-diversified language resource with rich linguistic information from morphology and syntax to semantics, the Prague Dependency Treebank –... -
Prague Dependency Treebank - Consolidated 1.0 (PDT-C 1.0)
A richly annotated and genre-diversified language resource, The Prague Dependency Treebank – Consolidated 1.0 (PDT-C 1.0, or PDT-C in short in the sequel) is a consolidated... -
The Kola Peninsula Spoken Corpus (KoPeSC) 1: Spoken Corpus to “Речь поморов Т...
The Kola Peninsula Spoken Corpus (KoPeSC) is a dataset of sound recordings and their transcriptions in ELAN of Pomor Russian dialect speech and of Sámi and Russian speech as... -
Frequency lists of words from the GOS 1.0 corpus
Frequency lists of words were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction tool... -
Frequency lists of words from the GOS 1.0 corpus 1.1
Frequency lists of words were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction tool... -
Training corpus of spoken Slovenian ROG 1.0
Training corpus of spoken Slovenian ROG 1.0 is the main resource for Slovenian language to train and evaluate technologies aimed at processing speech or speech transcripts, such... -
Spoken corpus Gos 1.1
Gos is a corpus of spoken Slovene that includes the transcripts of approximately 120 hours of speech recorded in various situations: radio and TV shows, school lessons and... -
Spoken corpus Gos 1.0
GOS is a corpus of spoken Slovene that includes the transcripts of approximately 120 hours of speech recorded in various situations: radio and TV shows, school lessons and... -
Spoken corpus Berta
The Berta Spoken Corpus contains six hours of recorded speech across a variety of interactional settings. These settings include 57 different speech events, with some captured... -
Spoken corpus Gos VideoLectures 4.0 (audio)
Gos VideoLectures is an add-on to the Gos reference corpus of spoken Slovene (http://hdl.handle.net/11356/1040), and covers public academic speech. The Gos VideoLectures corpus... -
Dialogue act annotated spoken corpus GORDAN 1.0 (transcription)
The GORDAN 1.0 corpus contains authentic data of spoken communication, annotated for dialogue acts according to the GORDAN 1.0 dialogue act annotation scheme, included in the... -
Spoken corpus Gos VideoLectures 2.0 (transcription)
Gos VideoLectures is an add-on to the Gos reference corpus of spoken Slovene (http://hdl.handle.net/11356/1040), and covers public academic speech. The Gos VideoLectures corpus... -
Multimodal corpus EVA 1.0
EVA Corpus 1.0 consists of one episode of an audio/video session plus corresponding orthographic transcriptions with a duration of 57 minutes. The multi-party spontaneous... -
Gos corpus n-grams 2.0
A collection of n-grams extracted from the Gos corpus of spoken Slovene (cf. http://eng.slovenscina.eu/korpusi/gos). Three sets of n-gram lists are provided for lowercased word... -
Frequency lists of word parts from the GOS 1.0 corpus
Frequency lists of words split into word parts were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction tool... -
Frequency lists of character-level n-grams from the GOS 1.0 corpus
Frequency lists of character-level n-grams were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction tool... -
ASR database ARTUR 1.0 (transcriptions)
Artur 1.0 is a speech database designed for the needs of developing automatic speech recognition for the Slovenian language. The complete database includes 1,067 hours of...