Dataset - B2FIND

Slovenian RoBERTa contextual embeddings model: SloBERTa 1.0

The monolingual Slovene RoBERTa (A Robustly Optimized Bidirectional Encoder Representations from Transformers) model is a state-of-the-art model representing words/tokens as...
Corpus of academic Slovene KAS 2.0

The KAS corpus of Slovene academic writing consists of almost 65,000 BSc/BA, 16,000 MSc/MA and 1,600 PhD theses (82 thousand texts, 5 million pages or 1,5 billion tokens)...

You can also access this registry using the API (see API Docs).

2 datasets found