Spoken corpus Gos 1.1

Dataset

PID

Gos is a corpus of spoken Slovene that includes the transcripts of approximately 120 hours of speech recorded in various situations: radio and TV shows, school lessons and lectures, private conversations between friends or within the family, work meetings, consultations, conversations in buying and selling situations, etc. All speech is transcribed in two versions – with pronunciation-based spelling and with standardized spelling – and it comprises over one million words. The corpus can be searched by means of the web concordancer where it is also possible to listen to the corresponding recordings: http://www.korpus-gos.net.

As opposed to the previous version, this one corrects some errors in the transcriptions and introduces various changes in the TEI and vertical encodings.

Identifier
PID	http://hdl.handle.net/11356/1438
Related Identifier	http://hdl.handle.net/11356/1040
Related Identifier	http://hdl.handle.net/11356/1771
Related Identifier	http://eng.slovenscina.eu/korpusi/gos
Metadata Access	http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1438

Provenance
Creator	Zwitter Vitez, Ana; Zemljarič Miklavčič, Jana; Krek, Simon; Stabej, Marko; Erjavec, Tomaž
Publisher	Centre for Language Resources and Technologies, University of Ljubljana
Publication Year	2021
Rights	Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0); https://creativecommons.org/licenses/by-nc-sa/4.0/; PUB
OpenAccess	true
Contact	info(at)clarin.si

Representation
Language	Slovenian; Slovene
Resource Type	corpus
Format	text/plain; charset=utf-8; application/zip; downloadable_files_count: 2
Discipline	Linguistics