Slovenian parliamentary corpus SlovParl 2.0

PID

The SlovParl corpus contains minutes of the Assembly of the Republic of Slovenia for the legislative period 1990-1992, i.e. it covers the period before, during, and after Slovenia became an independent country in 1991. The corpus comprises 232 sessions, 58,813 speeches and 10.8 million words. The corpus contains extensive meta-data about the speakers, a typology of sessions etc. and structural and editorial annotations.

This item comprises three datasets: - the corpus in TEI (module Transcriptions of speech); - the corpus in TEI with added automatic linguistic annotation: tokenisation, MSD tagging and lemmatisation; - the corpus in vertical format used by various concordancers, e.g. CWB and Sketch Engine; this format is simpler and smaller but does not contain all the information from the source TEI.

The SlovParl data originally come from https://github.com/SIstory/SlovParl, but have been converted to use TEI elements for speech.

The first version of this resource is presented in the paper: Pančur, Andrej. "Označevanje zbirke zapisnikov sej slovenskega parlamenta s smernicami TEI." In the Proceedings of the Conference on Language Technologies & Digital Humanities (Tomaž Erjavec and Darja Fišer, eds.) 142-148. Ljubljana: Znanstvena založba Filozofske fakultete v Ljubljani, 2016.

Identifier
PID http://hdl.handle.net/11356/1167
Related Identifier http://www.sdjt.si/wp/wp-content/uploads/2016/09/JTDH-2016_Pancur_Oznacevanje-zbirke-zapisnikov-sej-slovenskega-parlamenta.pdf
Related Identifier http://lrec-conf.org/workshops/lrec2018/W2/summaries/4_W2.html
Related Identifier http://hdl.handle.net/11356/1075
Related Identifier https://github.com/DARIAH-SI/CLARIN.SI/commit/1cbe75c2ae8c90fba40c167786b6c548a852944c
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1167
Provenance
Creator Pančur, Andrej; Šorn, Mojca; Erjavec, Tomaž
Publisher Institute of Contemporary History
Publication Year 2017
Rights Creative Commons - Attribution 4.0 International (CC BY 4.0); https://creativecommons.org/licenses/by/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Slovenian; Slovene
Resource Type corpus
Format application/zip; text/plain; charset=utf-8; downloadable_files_count: 3
Discipline Linguistics