The CVET corpus contains 230 texts (around 175 thousand words) of varying length, published in the religious journal "Cvetje z vertov sv. Frančiška" between 1887 and 1916, when the magazine was edited by the linguist Fr. Stanislav Škrabec. The articles are signed with the initials P. H. R. (padre Hijacint Repič) and are original texts, translations or adaptations. The majority are devotional and religious articles and hagiography.
The corpus is encoded in two variants: one contains the corpus encoded in TEI, while the other contains automatic linguistic annotations that include word modernization, lemmatisation, MULTEXT-East morphosyntactic annotations, and morphological and syntactic annotations according to the Universal Dependencies Formalism for Slovenian.
In addition to the two TEI-encoded versions, the corpus is also available in derived formats. First is the corpus in plain text but in several variants (original, normalised, lemmas; either tokenised or not, in original case or lower case), and the second vertical format as used by CQP complatible condordancers, such as noSketchEngine.