Collection of Slovenian paremiological units Pregovori 1.0

Dataset

PID

This corpus collects and annotates the extensive and highly valuable diachronic collection of Slovenian proverbs, 50 years and more in the making at the ZRC SAZU Institute of Slovenian Ethnology. It is composed of the structured 2,515 bibliographical items (1578-2010): printed books, journals, calendars, collecting campaigns in different journals, folklore collecting field-works, personal notes, etc. that served as the sources of the proverbs and the collection of the paremiological units. Each one is represented in two ways: as the diplomatic transcription from the source collection (due to the technical difficulties of the transcribers and human errors in transcription, the transcription of older texts is inconsistent) and as the critical transcription which normalizes the alphabet.

The words of the critical transcriptions have also been automatically modernised to contemporary spelling, and these words further annotated with lemmas, MULTEXT-East MSDs and Universal dependencies with the CLASSLA toolchain.

The canonical encoding of the corpus is TEI, but the corpus is also distributed in two derived encodings. One is the bibliography and sayings as two TSV files, and the other the vertical file, as used by CQP-type concordancers, such as Sketch Engine.

Identifier
PID	http://hdl.handle.net/11356/1455
Related Identifier	http://hdl.handle.net/11356/1853
Related Identifier	https://isn2.zrc-sazu.si/en/programi-in-projekti/traditional-paremiological-units-in-dialogue-with-contemporary-use
Metadata Access	http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1455

Provenance
Creator	Babič, Saša; Miha, Peče; Erjavec, Tomaž; Ivančič Kutin, Barbara; Šrimpf Vendramin, Katarina; Kropej Telban, Monika; Jakop, Nataša; Stanonik, Marija
Publisher	ZRC SAZU; Jožef Stefan Institute
Publication Year	2022
Rights	CLARIN.SI Licence ACA ID-BY-NC-INF-NORED 1.0; https://clarin.si/repository/xmlui/page/licence-aca-id-by-nc-inf-nored-1.0; ACA
OpenAccess	true
Contact	info(at)clarin.si

Representation
Language	Slovenian; Slovene
Resource Type	corpus
Format	text/plain; charset=utf-8; application/zip; downloadable_files_count: 3
Discipline	Linguistics