-
Spoken corpus Gos VideoLectures 1.0 (audio)
Gos VideoLectures is an add-on to the Gos reference speech corpus of Slovene (http://hdl.handle.net/11356/1040), and covers public academic speech. The Gos Videolectures... -
Frequency lists of word parts from the GOS 1.0 corpus
Frequency lists of words split into word parts were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction tool... -
Gos corpus n-grams 2.0
A collection of n-grams extracted from the Gos corpus of spoken Slovene (cf. http://eng.slovenscina.eu/korpusi/gos). Three sets of n-gram lists are provided for lowercased word... -
Frequency lists of word-level n-grams from the GOS 1.0 corpus 1.1
Frequency lists of word-level n-grams (or word sets) were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction... -
Frequency lists of words from the GOS 1.0 corpus
Frequency lists of words were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction tool... -
ASR database ARTUR 1.0 (audio)
Artur 1.0 is a speech database designed for the needs of automatic speech recognition for the Slovenian language. The database includes 1,067 hours of speech. 884 hours are... -
Frequency lists of word parts from the GOS 1.0 corpus 1.1
Frequency lists of words split into word parts were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction tool... -
Multimodal corpus EVA 1.0
EVA Corpus 1.0 consists of one episode of an audio/video session plus corresponding orthographic transcriptions with a duration of 57 minutes. The multi-party spontaneous... -
Frequency lists of words from the GOS 1.0 corpus 1.1
Frequency lists of words were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction tool... -
Spoken corpus Gos VideoLectures 1.0 (transcription)
Gos Videolectures is an add-on to the Gos reference speech corpus of Slovene (http://hdl.handle.net/11356/1040), and covers public academic speech. The Gos Videolectures... -
Spoken corpus Gos VideoLectures 2.0 (transcription)
Gos VideoLectures is an add-on to the Gos reference corpus of spoken Slovene (http://hdl.handle.net/11356/1040), and covers public academic speech. The Gos VideoLectures corpus... -
Frequency lists of character-level n-grams from the GOS 1.0 corpus
Frequency lists of character-level n-grams were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction tool... -
Vystadial 2013 – English data
Vystadial 2013 is a dataset of telephone conversations in English and Czech, developed for training acoustic models for automatic speech recognition in spoken dialogue systems.... -
Vystadial 2013 – Czech data
Vystadial 2013 is a dataset of telephone conversations in English and Czech, developed for training acoustic models for automatic speech recognition in spoken dialogue systems.... -
Spoken corpus of Karel Makoň
Talks of Karel Makoň given to his friends in the course of late sixties through early nineties of the 20th century. The topic is mostly christian mysticism. -
Prague Dependency Treebank - Consolidated 1.0 (PDT-C 1.0)
A richly annotated and genre-diversified language resource, The Prague Dependency Treebank – Consolidated 1.0 (PDT-C 1.0, or PDT-C in short in the sequel) is a consolidated... -
Prague Dependency Treebank of Spoken Czech 2.0 (PDTSC 2.0)
The Prague Dependency Treebank of Spoken Czech 2.0 (PDTSC 2.0) is a corpus of spoken language, consisting of 742,316 tokens and 73,835 sentences, representing 7,324 minutes... -
Vystadial 2016 – Czech data
This is the Czech data collected during the VYSTADIAL project. It is an extension of the 'Vystadial 2013' Czech part data release. The dataset comprises of telephone... -
Prague DaTabase of Spoken Czech 1.0
PDTSC 1.0 is a multi-purpose corpus of spoken language. 768,888 tokens, 73,374 sentences and 7,324 minutes of spontaneous dialog speech have been recorded, transcribed and...