-
The CLASSLA-Stanza model for lemmatisation of non-standard Croatian 2.1
The model for lemmatisation of non-standard Croatian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the hr500k training corpus... -
The CLASSLA-Stanza model for lemmatisation of standard Croatian 2.1
The model for lemmatisation of standard Croatian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the hr500k training corpus... -
The CLASSLA-StanfordNLP model for lemmatisation of non-standard Slovenian 1.0
The model for lemmatisation of non-standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k... -
Ekspress news article archive (in Estonian and Russian) 1.0
The dataset is an archive of articles from the Ekspress Meedia news site from 2009-2019, containing over 1.4M articles, mostly in Estonian language (1,115,120 articles) with... -
Morphological lexicon Sloleks 2.0
Sloleks is the reference morphological lexicon for Slovenian language, developed to be used in NLP applications and language manuals. Encoded in LMF XML, the lexicon contains... -
The CLASSLA-StanfordNLP model for lemmatisation of standard Serbian
The model for lemmatisation of standard Serbian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the SETimes.SR... -
The CLASSLA-StanfordNLP model for lemmatisation of standard Croatian
The model for lemmatisation of standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the hr500k training... -
The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.4
The model for lemmatisation of standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k... -
Beseda Corpus Lemmatisation Lexicon
Beseda Corpus Lemmatisation Lexicon for Slovenian language was generated at the Fran Ramovš Institute of Slovenian Language, primarily through inflection of open class words... -
Morphological lexicon Sloleks 1.0
Sloleks is the reference morphological lexicon for Slovenian language, developed to be used in NLP applications and language manuals. Encoded in LMF XML, the lexicon contains... -
Word embeddings CLARIN.SI-embed.hr 1.0
CLARIN.SI-embed.hr contains word embeddings induced from a large collection of Croatian texts composed of the Croatian web corpus hrWaC and a 400-million-token-heavy collection... -
ReLDI tag+lemma web service for WebLicht
WebLicht (https://weblicht.sfs.uni-tuebingen.de/) registry entry for webservice comprising tokenisation, PoS tagging, and lemmatisation. -
The CLASSLA-StanfordNLP model for lemmatisation of non-standard Croatian 1.1
The model for lemmatisation of non-standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the hr500k... -
CMC training corpus Janes-Tag 1.1
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation, sentence... -
The CLASSLA-StanfordNLP model for lemmatisation of non-standard Croatian 1.0
The model for lemmatisation of non-standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the hr500k... -
The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.3
The model for lemmatisation of standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k... -
The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.1
The model for lemmatisation of standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k... -
Morphological lexicon Sloleks 3.0
Sloleks is a reference morphological lexicon of Slovene that was developed to be used in various NLP applications and language manuals. It contains Slovene lemmas, their... -
MULTEXT-East non-commercial lexicons 4.0
The MULTEXT-East morphosyntactic lexicons have a simple structure, where each line is a lexical entry with three tab-separated fields: (1) the word-form, the inflected form of... -
Annotated Corpus of Pre-Standardized Balkan Slavic Literature
The corpus contains 15 linguistically annotated samples of "damaskini" and other Balkan Slavic manuscripts and print editions from the 16th-19th century, together with over 30...