Dataset - B2FIND

Korpus Czterech Wieszczów

Wiersze wchodzące w skład Korpusu Czterech Wieszczów, zob. Tomasz Korpysz Anna Mędrzecka Ewa Mirkowska Marek Troszyński, „Korpus Czterech Wieszczów” – cyfrowy wymiar...

Carniolan Provincial Assembly corpus Kranjska 1.0

The corpus contains meeting proceedings of the Carniolan Provincial Assembly from 1861 to 1913 (Obravnave deželnega zbora kranjskega / Bericht über die Verhandlungen des...

Slovenian translation corpus Spook 1.1

The Spook corpus was compiled to enable corpus-based studies in translation and comprises 713 texts and about 375 thousand words. It is composed of three types of texts. The...

Digital library and corpus of historical Slovene IMP 1.1

The IMP digital library contains historical Slovene books and other publications, together 658 texts with over 45,000 pages from the period 1584-1919. Each text contains...

Linguistically annotated multilingual comparable corpora of parliamentary deb...

ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and...

Multilingual comparable corpora of parliamentary debates ParlaMint 5.0

ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and...

Terminological dictionary of papermaking

This digital dictionary of papermaking was made on the basis of the printed edition, i.e. Marjeta Humar (ed.) Papirniški terminološki slovar. 1996. ZRC SAZU...

CMC training corpus Janes-Tag 2.1

Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation, sentence...

CMC training corpus Janes-Norm 1.2

Janes-Norm is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation,...

Slovene-Japanese Learner's Dictionary sloJa 1.1

The Slovenian-Japanese online dictionary for Slovenian speaking learners of Japanese was compiled by extracting and converting the Japanese-Slovenian dictionary jaSlo 3.1...

Comparable corpus of parliamentary debates ParlaMint-ES-CN 1.0

The ParlaMint-ES-CN corpus is the contribution of the Parliament of the Canary Islands (Parlamento de Canarias) to the ParlaMint collection of comparable parliamentary corpora...

Slovene-Japanese Learner's Dictionary sloJa 1.0

The Slovenian-Japanese online dictionary for Slovenian speaking learners of Japanese was compiled by extracting and converting the Japanese-Slovenian dictionary jaSlo 3.1...

Corpus of Slovenian historical legal texts SI-IUS 1.0

The SI-IUS collection of older law texts is meant to be used both as a digital library and as a language corpus. For the former, each text has been carefully annotated in TEI...

Enlightenment Architectures Project Schema

The TEI-conformant schema developed for the Enlightenment Architectures project and used to validate the 5 catalogues (two volumes of ‘fossils’, one volume of printed books and...

The "Mobile languages" corpus MoJezik 1.0 (transcription)

The "Mobile Languages" corpus documents in-depth, semi-structured sociolinguistic interviews with speakers from two Slovene regions and distinctive dialects: Idrija (Cerkno...

Comparable corpus of parliamentary debates ParlaMint-IL 1.0

The ParlaMint-IL corpus is the Israeli contribution to the ParlaMint collection of comparable parliamentary corpora (https://www.clarin.eu/parlamint), which contain...

The corpus of older Slovenian narrative prose PriLit 1.0

The PriLit corpus contains 37 texts of older Slovenian narrative prose by 12 authors. One text, Sreča v nesreči (Fortune in Misfortune) by Janez Cigler (first published in...

Linguistically annotated multilingual comparable corpora of parliamentary deb...

ParlaMint-en.ana 5.0 is the English machine translation of the ParlaMint.ana 5.0 (http://hdl.handle.net/11356/2005) set of corpora of parliamentary debates across Europe. The...

Multilingual comparable corpora of parliamentary debates ParlaMint 4.1

ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and...

Linguistically annotated multilingual comparable corpora of parliamentary deb...

ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and...

141 datasets found