CLARIN - Repositories

Genus (proximum) in the SSKJ2 dictionary senses

The datasets contain sense–genus combinations from the Dictionary of the Slovenian Standard Language, 2nd Edition (Slovar slovenskega knjižnega jezika, druga, dopolnjena in...

Sample from the audiobook "Zate" (For you)

This entry contains the first part of the audiobook "Zate" (For you) by author Marjetica Garzarolli Dharu (COBISS ID: 280622851, ISBN: 978-961-7198-95-9). I am a mother who has...

Sample from the audiobook "Med vsemi njunimi svetovi" (Between all their worlds)

This entry contains the first part of the audiobook "Med vsemi njunimi svetovi" (Between all their worlds) by author Petra Dolenc (COBISS ID: 278854659, ISBN:...

Sample from the audiobook "Poklon Himalaji" (A bow to the Himalayas)

This entry contains the first part of the audiobook "Poklon Himalaji" (A bow to the Himalayas) by author Barbara Popit (COBISS ID: 279688451, ISBN: 978-961-7198-91-1). Two...

Sample from the audiobook "Postal bom gasilec" (I want to become a firefighter)

This entry contains the first part of the audiobook "Postal bom gasilec " (I'll become a Firefighter) by author Miha Gril (COBISS ID: 280097795, ISBN: 978-961-7198-94-2). A...

Sample from the audiobook "Lovro in slaba ocena" (Lovro and the bad grade)

This entry contains the first part of the audiobook "Lovro in slaba ocena" (Lovro and the bad grade) by author Leopold Suhodolčan (COBISS ID: 278861827 , ISBN:...

Sample from the audiobook "Brut" (Brut)

This entry contains the first part of the audiobook "Brut" (Brut) by author Vesna Kosmač (COBISS ID: 278853635, ISBN: 978-961-7198-87-4). Everything and More About the Karst...

Sample from the audiobook "Življenje na limonadi" (Life with lemonade)

This entry contains the first part of the audiobook "Življenje na limonadi " (Life with lemonade) by author Enet Jogan Klemše (COBISS ID: 279563523, ISBN: 978-961-7198-90-4). If...

Sample from the audiobook "Praktično učenje" (Practical learning)

This entry contains the first part of the audiobook "Praktično učenje" (Practical learning) by author Dejan Krajlah (COBISS ID: 278851331, ISBN: 978-961-291-550-6). Make...

Sample from the audiobook "Bucika Betka in balonček Bruno" (Pinny Petal and B...

This entry contains the first part of the audiobook "Bucika Betka in balonček Bruno" (Pinny Petal and Bruno balloon) by author Borut Gombač (COBISS ID: 280741891, ISBN:...

AI-generated text corpus AI-GenT 1.0

The AI-Generated Text (AI-GenT) corpus is a collection of English and Slovenian texts generated by several large language models. The corpus has been used in comparisons to...

Slovene Lexicographic QA Fine-Tuning Corpus SloLexQA 1.0

The Slovene Lexicographic QA Fine-Tuning Corpus is a specialized dataset designed to advance the performance of AI models in understanding the structural, grammatical, and...

JRC EU DGT Translation Memory Parsebank DGT-UD 1.0

DGT-UD is a 2 billion word 23-language parallel syntactically parsed corpus, which consists of the JRC DGT translation memory of European law, automatically annotated with...

OptiQ

The OptiQ project aims to create a database application for a corpus of medieval and early modern texts on the history of optics, which are preserved mainly in manuscripts. The...

UniQ

The UniQ project is dedicated to the Prague struggle over universals ca. 1348–1500, making information about the corresponding corpus of texts available in a digital...

Content-specific classification of historical page images - annotated dataset

This dataset employs a comprehensive 11-label classification scheme to categorize scanned images of document pages. The types are based on their content and presentation format....

Anonymized Questionnaires and Response Data for the Bachelor Thesis "False Fr...

This dataset contains anonymized responses from 60 participants collected as part of the empirical research for the bachelor’s thesis “False Friends in Czech, Ukrainian and...

NameTag 3 Multilingual Model 260521

This is a trained model for the supervised machine learning tool NameTag 3 (https://ufal.mff.cuni.cz/nametag/3/). NameTag 3 is an open-source tool for both flat and nested named...

Slovenian Day of Resistance X & news corpus 1.1

The dataset contains social media posts from X and traditional media articles from online news sources related to the Slovenian commemorations of the Day of Resistance. We used...

Sample from the audiobook "Sam bog naj jo bere" (Let only God read it)

This entry contains the first part of the audiobook "Sam bog naj jo bere" (Let only God read it) by author Alenka Čurin Janžekovič (COBISS ID: 277038339, ISBN:...

5,067 datasets found