-
A morphological layer for the German part of the SMULTRON corpus
A morphological layer for the German part of the SMULTRON corpus. Layer was annotated according to the STTS tagset and the annotation guidelines of the Tiger corpus.... -
Indonesian web corpus (idWac)
Indonesian text corpus from web. Crawling done by SpiderLing in 2017. Filtering by JusText and Onion (see http://corpus.tools/ for details). Tagged and lemmatized by MorphInd... -
Czech Models (MorfFlex CZ 160310 + PDT 3.0) for MorphoDiTa 160310
Czech models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from MorfFlex CZ... -
Engineering job ads corpus
The corpus presented consists of job ads in Spanish related to Engineering positions in Peru. The documents were preprocessed and annotated for POS tagging, NER, and topic... -
Czech Models (MorfFlex CZ + PDT) for MorphoDiTa
Czech models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from MorfFlex CZ and... -
Word representations for multiple languages
Dictionaries with different representations for various languages. Representations include brown clusters of different sizes and morphological dictionaries extracted using... -
Slovak MorphoDiTa Models 170914
Slovak models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from MorfFlex SK... -
Czech Models (MorfFlex CZ 2.0 + PDT-C 1.0) for MorphoDiTa 220710
Czech models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from MorfFlex CZ 2.0,... -
English Models (Morphium + WSJ) for MorphoDiTa
English models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from Morphium and... -
POS Tagging and Lemmatization (Czech model)
Model trained for Czech POS Tagging and Lemmatization using Czech version of BERT model, RobeCzech. Model is trained on data from Prague Dependency Treebank 3.5. Model is a part... -
Large-Scale Colloquial Persian 0.5
"Large Scale Colloquial Persian Dataset" (LSCP) is hierarchically organized in asemantic taxonomy that focuses on multi-task informal Persian language understanding as a... -
Czech Models (MorfFlex CZ 161115 + PDT 3.0) for MorphoDiTa 161115
Czech models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from MorfFlex CZ...