-
Albanian Spoken Corpus in Kosovo 1.0
This is the third version of a spoken corpus of Albanian in Kosovo. The data of the corpus is based on short life stories of 212 informants out of sample of 1800 speakers... -
ASR database ARTUR 0.1 (transcriptions)
ARTUR is a speech database designed for the needs of automatic speech recognition for the Slovenian language. The database includes 1,035 hours of speech, although only 840... -
ASR database ARTUR 0.1 (audio)
ARTUR is a speech database designed for the needs of automatic speech recognition for the Slovenian language. The database includes 1,035 hours of speech, although only 840... -
ASR database ARTUR 1.0 (audio)
Artur 1.0 is a speech database designed for the needs of automatic speech recognition for the Slovenian language. The database includes 1,067 hours of speech. 884 hours are... -
ASR database ARTUR 1.0 (transcriptions)
Artur 1.0 is a speech database designed for the needs of developing automatic speech recognition for the Slovenian language. The complete database includes 1,067 hours of... -
List of formulaic sequences in spoken Slovenian
This document contains 2,374 formulaic sequences in spoken Slovenian, i.e. frequently recurring strings of two to five words, manually annotated for syntactic structure,... -
Corpus of metaphorical expressions in spoken Slovene language G-KOMET 1.0
G-KOMET (a corpus of metaphorical expressions in spoken Slovene language) is an upgrade of the hand-annotated written corpus for metaphorical expressions KOMET... -
A Digital Dictionary of Tunis Arabic - TUNICO (ELEXIS)
A corpus-based dictionary, enriched with historical data. The dictionary was not only built on data from the corpus of spoken language that was compiled in the same project, but... -
TED-ELH Parallel Corpus
The corpus contains parallelly aligned scripts of TED Talks in English, Lithuanian, and Hebrew. It contains spoken language data.
