-
Annotated sample of the Slovenian Biographical Lexicon SBL-51abbr 1.0
This dataset consists of 51 randomly selected entries from the Slovenian Biographical Lexicon (1925–1991). The text of each entry has been manually tokenised and sentence... -
The CLASSLA-StanfordNLP model for lemmatisation of non-standard Serbian 1.1
The model for lemmatisation of non-standard Serbian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the SETimes.SR... -
Word embeddings CLARIN.SI-embed.sl 1.0
CLARIN.SI-embed.sl contains word embeddings induced from a large collection of Slovene texts composed of existing corpora of Slovene, e.g GigaFida, Janes, KAS, slWaC etc. The... -
Morphological lexicon Sloleks 1.2
Sloleks is the reference morphological lexicon for Slovenian language, developed to be used in NLP applications and language manuals. Encoded in LMF XML, the lexicon contains... -
The CLASSLA-StanfordNLP model for lemmatisation of standard Croatian 1.1
The model for lemmatisation of standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the hr500k training... -
The CLASSLA-StanfordNLP model for lemmatisation of non-standard Serbian 1.0
The model for lemmatisation of non-standard Serbian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the SETimes.SR... -
The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian
The model for lemmatisation of standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k... -
CMC training corpus Janes-Tag 1.0
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation, sentence... -
The CLASSLA-StanfordNLP model for lemmatisation of standard Serbian 1.2
The model for lemmatisation of standard Serbian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the SETimes.SR... -
The CLASSLA-StanfordNLP model for lemmatisation of standard Serbian 1.1
The model for lemmatisation of standard Serbian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the SETimes.SR... -
Transkriptionskonventionen im Vergleich
Synopsis of transcription conventions used in six international sign language research projects including annotation tool and tiers in transcripts, divided into conventional... -
Die Erstellung von Fachgebärdenlexika am Institut für Deutsche Gebärdensprach...
Detailed description of how six corpus-based LSP dictionaries German – German Sign Language (DGS) were produced including elicitation methods, annotation and...