-
NeMo Conformer CTC BPE E2E Automated Speech Recognition service RSDO-DS2-ASR-...
Automated Speech Recognition service for NeMo Conformer CTC BPE E2E models. For more details about building such models, see the official NVIDIA NeMo documentation... -
ccGigafida ARPA language model 1.0
The ccGigafida ARPA language model was created from the ccGigafida written corpus of Slovenian (https://www.clarin.si/repository/xmlui/handle/11356/1035) using the KenLM... -
Corpus extraction tool LIST 1.3
The LIST corpus extraction tool is a Java program for extracting lists from text corpora on the levels of characters, word parts, words, and word sets. It supports VERT and TEI... -
Slovenian RoBERTa contextual embeddings model: SloBERTa 2.0
The monolingual Slovene RoBERTa (A Robustly Optimized Bidirectional Encoder Representations from Transformers) model is a state-of-the-art model representing words/tokens as... -
CorefUD conversion of Slovene corpus for aspect-based sentiment analysis Sent...
This corpus is the CorefUD conversion of the SentiCoref corpus for coreference resolution in Slovene contained within the SUK 1.1 collection of corpora... -
CorefUD conversion of Slovene coreference resolution corpus coref149
This corpus is the CorefUD conversion of the coref149 corpus for coreference resolution in Slovene (http://hdl.handle.net/11356/1182). It contains 149 documents annotated with... -
ASR database ARTUR 0.1 (transcriptions)
ARTUR is a speech database designed for the needs of automatic speech recognition for the Slovenian language. The database includes 1,035 hours of speech, although only 840... -
Automatically stress labelled morphological lexicon Sloleks 1.2, version 1.1
This lexicon is an extended version of Sloleks 1.2, http://hdl.handle.net/11356/1039. It contains all the original data from Sloleks with added information about the stress of... -
Character-level part-of-speech tagger of Slovene language
Part-of-speech tagger for Slovene language implemented using convolutional and LSTM neural networks. Tagger uses character-level representation of sentences. The tagger has been... -
ASR database ARTUR 0.1 (audio)
ARTUR is a speech database designed for the needs of automatic speech recognition for the Slovenian language. The database includes 1,035 hours of speech, although only 840... -
Automatically stress labelled morphological lexicon Sloleks 1.2
This lexicon is an extended version of Sloleks 1.2 (http://hdl.handle.net/11356/1039). It contains all the original data from Sloleks with added information about the stress of... -
Slovene coreference resolution corpus coref149
This corpus contains a subset of the ssj500k v1.4 corpus, http://hdl.handle.net/11356/1052. Each of 149 documents contains a paragraph from ssj500k that contains at least 100... -
Dataset of Slovene idiomatic expressions SloIE
SloIE is a manually labelled dataset of Slovene idiomatic expressions. It contains 29,400 sentences with 75 different expressions that can occur with either a literal or an... -
ASR database ARTUR 1.0 (audio)
Artur 1.0 is a speech database designed for the needs of automatic speech recognition for the Slovenian language. The database includes 1,067 hours of speech. 884 hours are... -
Slovene Conformer CTC BPE E2E Automated Speech Recognition model RSDO-DS2-ASR...
This Conformer CTC BPE E2E Automated Speech Recognition model was trained following the NVIDIA NeMo Conformer-CTC recipe (for details see the official NVIDIA NeMo NMT... -
Summarization datasets from the KAS corpus KAS-Sum 1.0
Summarization datasets were created from the text bodies in the KAS 2.0 corpus (http://hdl.handle.net/11356/1448) and the abstracts from the KAS-Abs 2.0 corpus... -
Dependency tree extraction tool STARK 1.0
STARK is a python-based command-line tool for extraction of dependency trees from parsed corpora, aimed at corpus-driven linguistic investigations of syntactic phenomena of... -
Corpus of academic Slovene KAS 2.0
The KAS corpus of Slovene academic writing consists of almost 65,000 BSc/BA, 16,000 MSc/MA and 1,600 PhD theses (82 thousand texts, 5 million pages or 1,5 billion tokens)... -
English translation of the Slovene Natural Language Inference Dataset SI-NLI-...
SI-NLI-en is an English translation of the SI-NLI Slovene Natural Language Inference Dataset (http://hdl.handle.net/11356/1707). The English version was compiled by first using... -
Opinion corpus of Slovene web commentaries KKS 1.001
The corpus of web commentaries with sentiment categorizations was developed as a part of BSc Thesis (Kadunc, 2016) and served for evaluation of the Slovene Sentiment Lexicon KSS...
