Dataset - B2FIND

Czech Models (MorfFlex CZ 2.0 + PDT-C 1.0) for MorphoDiTa 220710

Czech models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from MorfFlex CZ 2.0,...

Czech Models (MorfFlex CZ + PDT) for MorphoDiTa

Czech models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from MorfFlex CZ and...

POS Tagging and Lemmatization (Czech model)

Model trained for Czech POS Tagging and Lemmatization using Czech version of BERT model, RobeCzech. Model is trained on data from Prague Dependency Treebank 3.5. Model is a part...

English Models (Morphium + WSJ) for MorphoDiTa

English models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from Morphium and...

A morphological layer for the German part of the SMULTRON corpus

A morphological layer for the German part of the SMULTRON corpus. Layer was annotated according to the STTS tagset and the annotation guidelines of the Tiger corpus....

Indonesian web corpus (idWac)

Indonesian text corpus from web. Crawling done by SpiderLing in 2017. Filtering by JusText and Onion (see http://corpus.tools/ for details). Tagged and lemmatized by MorphInd...

Czech Models (MorfFlex CZ 161115 + PDT 3.0) for MorphoDiTa 161115

Czech models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from MorfFlex CZ...

Czech Models (MorfFlex CZ 160310 + PDT 3.0) for MorphoDiTa 160310

Czech models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from MorfFlex CZ...

Engineering job ads corpus

The corpus presented consists of job ads in Spanish related to Engineering positions in Peru. The documents were preprocessed and annotated for POS tagging, NER, and topic...

Large-Scale Colloquial Persian 0.5

"Large Scale Colloquial Persian Dataset" (LSCP) is hierarchically organized in asemantic taxonomy that focuses on multi-task informal Persian language understanding as a...

Word representations for multiple languages

Dictionaries with different representations for various languages. Representations include brown clusters of different sizes and morphological dictionaries extracted using...

Slovak MorphoDiTa Models 170914

Slovak models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from MorfFlex SK...

12 datasets found