Collection of Slovenian paremiological units Pregovori 1.0

PID

This corpus collects and annotates the extensive and highly valuable diachronic collection of Slovenian proverbs, 50 years and more in the making at the ZRC SAZU Institute of Slovenian Ethnology. It is composed of the structured 2,515 bibliographical items (1578-2010): printed books, journals, calendars, collecting campaigns in different journals, folklore collecting field-works, personal notes, etc. that served as the sources of the proverbs and the collection of the paremiological units. Each one is represented in two ways: as the diplomatic transcription from the source collection (due to the technical difficulties of the transcribers and human errors in transcription, the transcription of older texts is inconsistent) and as the critical transcription which normalizes the alphabet.

The words of the critical transcriptions have also been automatically modernised to contemporary spelling, and these words further annotated with lemmas, MULTEXT-East MSDs and Universal dependencies with the CLASSLA toolchain.

The canonical encoding of the corpus is TEI, but the corpus is also distributed in two derived encodings. One is the bibliography and sayings as two TSV files, and the other the vertical file, as used by CQP-type concordancers, such as Sketch Engine.

Identifier
PID http://hdl.handle.net/11356/1455
Related Identifier https://isn2.zrc-sazu.si/en/programi-in-projekti/traditional-paremiological-units-in-dialogue-with-contemporary-use
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1455
Provenance
Creator Babič, Saša; Miha, Peče; Erjavec, Tomaž; Ivančič Kutin, Barbara; Šrimpf Vendramin, Katarina; Kropej Telban, Monika; Jakop, Nataša; Stanonik, Marija
Publisher ZRC SAZU; Jožef Stefan Institute
Publication Year 2022
Rights CLARIN.SI Licence ACA ID-BY-NC-INF-NORED 1.0; https://clarin.si/repository/xmlui/page/licence-aca-id-by-nc-inf-nored-1.0; ACA
OpenAccess true
Contact info(at)clarin.si
Representation
Language Slovenian; Slovene
Resource Type corpus
Format text/plain; charset=utf-8; application/zip; downloadable_files_count: 3
Discipline Linguistics