Corpus of texts by Hijacint Repič in "Cvetje z vertov sv. Frančiška" CVET 1.0

Dataset

PID

The CVET corpus contains 230 texts (around 175 thousand words) of varying length, published in the religious journal "Cvetje z vertov sv. Frančiška" between 1887 and 1916, when the magazine was edited by the linguist Fr. Stanislav Škrabec. The articles are signed with the initials P. H. R. (padre Hijacint Repič) and are original texts, translations or adaptations. The majority are devotional and religious articles and hagiography. The corpus is encoded in two variants: one contains the corpus encoded in TEI, while the other contains automatic linguistic annotations that include word modernization, lemmatisation, MULTEXT-East morphosyntactic annotations, and morphological and syntactic annotations according to the Universal Dependencies Formalism for Slovenian. In addition to the two TEI-encoded versions, the corpus is also available in derived formats. First is the corpus in plain text but in several variants (original, normalised, lemmas; either tokenised or not, in original case or lower case), and the second vertical format as used by CQP complatible condordancers, such as noSketchEngine.

Identifier
PID	http://hdl.handle.net/11356/1226
Metadata Access	http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1226

Provenance
Creator	Košir, Diana; Erjavec, Tomaž
Publisher	Science and Research Centre Koper
Publication Year	2024
Rights	Creative Commons - Attribution 4.0 International (CC BY 4.0); PUB; https://creativecommons.org/licenses/by/4.0/
OpenAccess	true
Contact	info(at)clarin.si

Representation
Language	Slovenian; Slovene
Resource Type	corpus
Format	text/plain; charset=utf-8; application/zip; downloadable_files_count: 4
Discipline	Linguistics