Dataset - B2FIND

The Trankit model for linguistic processing of written and spoken Slovenian 1.3

This is a retrained Slovenian model for the Trankit v1.1.1 library for multilingual natural language processing (https://pypi.org/project/trankit/), trained on the concatenation...

The Trankit model for linguistic processing of written and spoken Slovenian 1.2

This is a retrained Slovenian model for the Trankit v1.1.1 library for multilingual natural language processing (https://pypi.org/project/trankit/), trained on the concatenation...

Trankit model for SST 2.15 1.1

This is a retrained Slovenian model for the Trankit v1.1.1 library for multilingual natural language processing (https://pypi.org/project/trankit/), trained on the SST treebank...

The Trankit model for linguistic processing of spoken and written Slovenian 1.1

This is a retrained Slovenian model for the Trankit v1.1.1 library for multilingual natural language processing (https://pypi.org/project/trankit/), trained on the concatenation...

The Trankit model for linguistic process of standard written Slovenian 1.1

This is a retrained Slovenian model for the Trankit v1.1.1 library for multilingual natural language processing (https://pypi.org/project/trankit/), trained on the reference SSJ...

Trankit model for linguistic processing of spoken Slovenian

This is a retrained Slovenian spoken language model for Trankit v1.1.1 library (https://pypi.org/project/trankit/). It is able to predict sentence segmentation, tokenization,...

The Trankit model for linguistic processing of standard Slovenian

This is a retrained Slovenian standard model for Trankit v1.1.1 library (https://pypi.org/project/trankit/). It is able to predict sentence segmentation, tokenization,...

Word embeddings CLARIN.SI-embed.sl 1.0

CLARIN.SI-embed.sl contains word embeddings induced from a large collection of Slovene texts composed of existing corpora of Slovene, e.g GigaFida, Janes, KAS, slWaC etc. The...

Corpus of written standard Slovene Gigafida 2.2

Gigafida 2.2 is a reference corpus of written Slovene texts published in the period 1990-2018. It is comprised of daily news, magazines, a selection of web texts (a certain...

Corpus of written standard Slovene Gigafida 2.1

Gigafida 2.1 is a reference corpus of written Slovene texts published in the period 1990-2018. It is comprised of daily news, magazines, a selection of web texts (a certain...

Corpus of Written Standard Slovene Gigafida 2.0

Gigafida 2.0, with about 1.1 billion words, is a reference corpus of written Slovene text published in the period 1990-2018. It is comprised of daily news, magazines, a...

Digital library and corpus of historical Slovene IMP 1.1

The IMP digital library contains historical Slovene books and other publications, together 658 texts with over 45,000 pages from the period 1584-1919. Each text contains...

Morphological Lexicon of Slovene Sloleks 3.1

Sloleks is a reference morphological lexicon of Slovene that was developed to be used in various NLP applications and language manuals. It contains Slovene lemmas, their...

Morphological lexicon Sloleks 3.0

Sloleks is a reference morphological lexicon of Slovene that was developed to be used in various NLP applications and language manuals. It contains Slovene lemmas, their...

Morphological lexicon Sloleks 2.0

Sloleks is the reference morphological lexicon for Slovenian language, developed to be used in NLP applications and language manuals. Encoded in LMF XML, the lexicon contains...

Morphological lexicon Sloleks 1.2

Sloleks is the reference morphological lexicon for Slovenian language, developed to be used in NLP applications and language manuals. Encoded in LMF XML, the lexicon contains...

CMC training corpus Janes-Tag 2.1

Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation, sentence...

Die Erstellung von Fachgebärdenlexika am Institut für Deutsche Gebärdensprach...

Detailed description of how six corpus-based LSP dictionaries German – German Sign Language (DGS) were produced including elicitation methods, annotation and...

Transkriptionskonventionen im Vergleich

Synopsis of transcription conventions used in six international sign language research projects including annotation tool and tiers in transcripts, divided into conventional...

Synergies between transcription and lexical database building: The case of Ge...

Building a lemmatised corpus of German Sign Language (DGS) using iLex, a relational database and annotation tool; consistent token-type matching (lemmatisation) and quality...

84 datasets found