Corpus of spoken Slovenian ROG-Dialog 1.0

PID

Corpus of spoken Slovenian ROG-Dialog consists of volunteered audio, recorded by students by asking their relatives or acquaintances to talk on record in their homes. The speakers were directed to use various styles of dialogue, including instructions, interviews, discussions, story telling, and chatting. Dialogue themes were freely chosen, most prevalent themes include travelling, health, childhood memories, work, technology, food, and entertainment.

Recordings and metadata were uploaded to the Govorjena Slovenščina web portal (https://govorjena-slovenscina.um.si/), manually segmented and transcribed in both colloquial and standardized orthographic transcriptions, and annotated with dialogue acts and sentiment.

The 25 speakers in this corpus cover all statistical regions of Slovenia with their ages ranging from 21 to 82 years. The corpus includes speakers from both rural and urban areas. Reflecting this geographic and social diversity, speech samples range from standard colloquial registers to local dialects, with some speakers employing distinct regional varieties.

ROG-Dialog is distributed as: - EXMARaLDA format (.EXB files) for viewing with Partitur Editor (https://www.exmaralda.org/) - .EXS files and Rog-Art.coma file for searching through the annotated corpus in the EXMARaLDA EXAKT concordancer (https://www.exmaralda.org/) - .TRS files for viewing the transcriptions without annotations with Transcriber (https://trans.sourceforge.net/en/presentation.php) - .TXT plain-text files

ROG-dialog data were compiled to complement the ROG-Artur subcorpus of the ROG 1.0 training corpus of spoken Slovenian (http://hdl.handle.net/11356/1992). However, the two corpora differ in their annotation levels, and harmonising these remains a task for future merging.

Identifier
PID http://hdl.handle.net/11356/2073
Related Identifier https://govorjena-slovenscina.um.si/
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/2073
Provenance
Creator Verdonik, Darinka; Rupnik, Peter; Vidinić, Jasna; Ljubešić, Nikola
Publisher Faculty of Electrical Engineering and Computer Science, University of Maribor; Jožef Stefan Institute
Publication Year 2025
Rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0); PUB; https://creativecommons.org/licenses/by-sa/4.0/
OpenAccess true
Contact info(at)clarin.si
Representation
Language Slovenian; Slovene
Resource Type corpus
Format application/zip; text/plain; charset=utf-8; downloadable_files_count: 2
Discipline Linguistics