CLARIN - Repositories

Phonetic segmentation and acoustic measurements of spoken Slovenian SloPhonSe...

SloPhonSeg 1.0 is a dataset of automatically generated phonetic segmentations and acoustic-phonetic measurements for selected recordings and transcriptions from the spoken...

Sample from the audiobook "Gelika" (Gelika)

This entry contains the first part of the audiobook "Gelika" (Gelika) by author Ema Golčer (COBISS ID: 277477635, ISBN: 978-961-291-546-9).

Samples from the audiobook "Odprejo se vrata" (The doors open)

This entry contains the first part of the audiobook "Odprejo se vrata" (The doors open) by author Nina Mav Hrovat (COBISS ID: 271810819, ISBN: 978-961-291-534-6). Ana is a...

Samples from the audiobook "Kalne vode" (Murky waters)

This entry contains the first part of the audiobook "Kalne vode" (Murky waters) by author Tone Frelih (COBISS ID: 277070851, ISBN: 978-961-291-544-5). A dark story intertwining...

Sample from the audiobook "Lističi" (Leaflets)

This entry contains the first part of the audiobook "Lističi" (Leaflets) by author Milan Dekleva (COBISS ID: 277486595, ISBN: 978-961-291-547-6). In the background of this...

Sample from the audiobook "En korak, en utrip srca'' (One step, one heartbeat)

This entry contains the first part of the audiobook "En korak, en utrip srca" (One step, one heartbeat) by author Leopold Suhodolčan (COBISS ID: 277539843, ISBN:...

Sample from the audiobook "Cesar Arnulf'' (Emperor Arnulf)

This entry contains the first part of the audiobook "Cesar Arnulf" (Emperor Arnulf) by author Leopold Suhodolčan (COBISS ID: 277489667, ISBN: 978-961-291-548-39).

Sample from the audiobook "Potovanje slona Jumba'' (The journey of Jumbo the ...

This entry contains the first part of the audiobook "Potovanje slona Jumba" (The journey of Jumbo the Elephant) by author Leopold Suhodolčan (COBISS ID: 274658819, ISBN:...

Samples from the audiobook "Moj prijatelj Jumbo" (My friend Jumbo)

This entry contains the first part of the audiobook "Moj prijatelj Jumbo" (My friend Jumbo) by author Leopold Suhodolčan (COBISS ID: 269241091, ISBN: 978-961-7194-51-7). In...

Sample from the audiobook "Z vami se igra Krojaček Hlaček'' (Krojaček Hlaček ...

This entry contains the first part of the audiobook "Z vami se igra Krojaček Hlaček" (Krojaček Hlaček plays with you) by author Leopold Suhodolčan (COBISS ID: 269242883, ISBN:...

Sample from the audiobook "Kuža Luža" (Puddle the puppy)

This entry contains the first part of the audiobook "Kuža Luža" (Puppy puddle) by author Leopold Suhodolčan (COBISS ID: 269240835 , ISBN: 978-961-7194-50-0). The theme of the...

Sample from the audiobook "Peter nos in velike čarovnije'' (Peter nose and th...

This entry contains the first part of the audiobook "Peter nos in velike čarovnije" (Peter nose and the great magic) by author Leopold Suhodolčan (COBISS ID: 268061699, ISBN:...

Monitor corpus of Slovene Trendi 2026-03

The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 60 publishers. Trendi 2026-02 covers the period from January...

Overview of inflectional paradigms in Slovenian

The purpose of the overview is to provide a comprehensive overview of the inflectional features associated with specific endings. Each ending has a dedicated row in the table...

Database of the Western South Slavic Verb HyperVerb -- Derivation

The verbal Western South Slavic database (WeSoSlaV) contains 3000 most frequent Slovenian and 5300 most frequent BCS verbs which are all coded for a number of properties related...

Word embeddings CLARIN.SI-embed.sl 2.0

CLARIN.SI-embed.sl contains word embeddings induced from a large collection of Slovene texts composed of existing corpora of Slovene, e.g GigaFida, Janes, KAS, slWaC, MaCoCu-sl,...

SimLex-999 Slovenian translation SimLex-999-sl 1.0

The resource contains English SimLex-999 (Hill et al. 2015) and their Slovene translations. In the translation process, the word pairs were first translated by two translators...

Word embeddings CLARIN.SI-embed.sl 1.0

CLARIN.SI-embed.sl contains word embeddings induced from a large collection of Slovene texts composed of existing corpora of Slovene, e.g GigaFida, Janes, KAS, slWaC etc. The...

ccGigafida ARPA language model 1.0

The ccGigafida ARPA language model was created from the ccGigafida written corpus of Slovenian (https://www.clarin.si/repository/xmlui/handle/11356/1035) using the KenLM...

Nikolay Nevskiy's Dictionary of Miyako-Ryukyuan

The present transcript reflects Nikolay Nevskiy’s lexical fieldwork notes which he produced in 1920s as a result of his fieldwork in the Miyako islands, Japan. The transcript is...

5,067 datasets found