-
The CLASSLA-StanfordNLP model for lemmatisation of standard Macedonian 1.0
The model for lemmatisation of standard Macedonian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the 1984 training... -
The CLASSLA-StanfordNLP model for lemmatisation of standard Bulgarian 1.1
The model for lemmatisation of standard Bulgarian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the BulTreeBank... -
MULTEXT-East free lexicons 4.0
The MULTEXT-East morphosyntactic lexicons have a simple structure, where each line is a lexical entry with three tab-separated fields: (1) the word-form, the inflected form of... -
Training corpus jos1M 1.1
The jos1M corpus contains 1 million words of sampled paragraphs from the FidaPLUS corpus. It is meant to serve as a training corpus for word-level tagging of Slovene. This... -
Lexicon of historical Slovene imp25k 1.1
The imp25k lexicon of historical Slovene was created automatically from the goo300k and foo3M annotated corpora and contains attested and manually verified word forms and their... -
ReLDI tag+lemma+parse web service for WebLicht
WebLicht (https://weblicht.sfs.uni-tuebingen.de/) registry entry for webservice comprising tokenisation, PoS tagging, lemmatisation and dependency parsing. Tool source files... -
Annotated Corpus of Pre-Standardized Balkan Slavic Literature 1.1
The corpus contains 23 linguistically annotated samples of "damaskini" and other Balkan Slavic manuscripts and print editions from the 15th-19th century, together with over 50... -
The CLASSLA-Stanza model for lemmatisation of standard Serbian 2.1
The model for lemmatisation of standard Serbian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SETimes.SR training corpus... -
Croatian Twitter training corpus ReLDI-NormTag-hr 1.0
ReLDI-NormTag-hr 1.0 is a manually annotated corpus of Croatian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word... -
Serbian Twitter training corpus ReLDI-NormTagNER-sr 3.0
ReLDI-NormTagNER-sr 3.0 is a manually annotated corpus of Serbian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation,... -
Serbian Twitter training corpus ReLDI-NormTag-sr 1.0
ReLDI-NormTag-sr 1.0 is a manually annotated corpus of Serbian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word... -
The CLASSLA-StanfordNLP model for lemmatisation of standard Croatian 1.2
The model for lemmatisation of standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the hr500k training... -
The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.2
The model for lemmatisation of standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k... -
Word embeddings CLARIN.SI-embed.sr 1.0
CLARIN.SI-embed.sr contains word embeddings induced from the srWaC web corpus. The embeddings are based on the skip-gram model of fastText trained on 554,606,544 tokens of... -
Croatian Twitter training corpus ReLDI-NormTagNER-hr 3.0
ReLDI-NormTagNER-hr 3.0 is a manually annotated corpus of Croatian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation,... -
The CLASSLA-StanfordNLP model for lemmatisation of non-standard Slovenian 1.1
The model for lemmatisation of non-standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k... -
The CLASSLA-Stanza model for lemmatisation of non-standard Slovenian 2.1
This model for lemmatisation of non-standard Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SUK training corpus... -
The CLASSLA-StanfordNLP model for lemmatisation of standard Bulgarian 1.0
The model for lemmatisation of standard Bulgarian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the BulTreeBank... -
The CLASSLA-Stanza model for lemmatisation of standard Slovenian 2.0
This model for lemmatisation of standard Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SUK training corpus... -
The CLASSLA-Stanza model for lemmatisation of non-standard Serbian 2.1
The model for lemmatisation of non-standard Serbian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SETimes.SR training corpus...