-
POS Tagging and Lemmatization (Czech model)
Model trained for Czech POS Tagging and Lemmatization using Czech version of BERT model, RobeCzech. Model is trained on data from Prague Dependency Treebank 3.5. Model is a part... -
Indonesian web corpus (idWac)
Indonesian text corpus from web. Crawling done by SpiderLing in 2017. Filtering by JusText and Onion (see http://corpus.tools/ for details). Tagged and lemmatized by MorphInd... -
Universal Dependencies 2.0 Models for UDPipe (2017-08-01)
Tokenizer, POS Tagger, Lemmatizer and Parser models for all 50 languages of Universal Depenencies 2.0 Treebanks, created solely using UD 2.0 data... -
Universal Dependencies 1.2 Models for UDPipe
Tokenizer, POS Tagger, Lemmatizer and Parser models for all Universal Depenencies 1.2 Treebanks, created solely using UD 1.2 data (http://hdl.handle.net/11234/1-1548). To use... -
UDPipe
UDPipe is an trainable pipeline for tokenization, tagging, lemmatization and dependency parsing of CoNLL-U files. UDPipe is language-agnostic and can be trained given only... -
Universal Dependencies 2.4 Models for UDPipe (2019-05-31)
Tokenizer, POS Tagger, Lemmatizer and Parser models for 90 treebanks of 60 languages of Universal Depenencies 2.4 Treebanks, created solely using UD 2.4 data... -
Czech Verbal MWEs
Lexicon of Czech verbal multiword expressions (VMWEs) used in Parseme Shared Task 2017.... -
The Diorisis Ancient Greek Corpus
An annotated corpus of literary Ancient Greek sourced from the Perseus Canonical Greek Lit repository (https://github.com/PerseusDL/canonical-greekLit), “The Little Sailing”... -
CoNLL 2018 Shared Task - UDPipe Baseline Models and Supplementary Materials
Baseline UDPipe models for CoNLL 2018 Shared Task in UD Parsing, and supplementary material. The models require UDPipe version at least 1.2 and are evaluated using the official... -
Universal Dependencies 2.6 models for UDPipe 2 (2020-08-31)
Tokenizer, POS Tagger, Lemmatizer and Parser models for 99 treebanks of 63 languages of Universal Depenencies 2.6 Treebanks, created solely using UD 2.6 data... -
Universal Dependencies 2.3 Models for UDPipe (2018-11-15)
Tokenizer, POS Tagger, Lemmatizer and Parser models for 84 treebanks of 56 languages of Universal Depenencies 2.3 Treebanks, created solely using UD 2.3 data... -
Prague Dependency Treebank - Consolidated 2.0 (PDT-C 2.0)
A manually annotated and genre-diversified language resource with rich linguistic information from morphology and syntax to semantics, the Prague Dependency Treebank –... -
Universal Dependencies 2.10 models for UDPipe 2 (2022-07-11)
Tokenizer, POS Tagger, Lemmatizer and Parser models for 123 treebanks of 69 languages of Universal Depenencies 2.10 Treebanks, created solely using UD 2.10 data... -
CoNLL 2017 Shared Task - UDPipe Baseline Models and Supplementary Materials
Baseline UDPipe models for CoNLL 2017 Shared Task in UD Parsing, and supplementary material. The models require UDPipe version at least 1.1 and are evaluated using the official... -
Persian Morphologically Segmented Lexicon 0.5
This dataset includes 45300 Persian word forms which are manually segmented into sequences of morphemes. -
Universal Dependencies 2.12 models for UDPipe 2 (2023-07-17)
Tokenizer, POS Tagger, Lemmatizer and Parser models for 131 treebanks of 72 languages of Universal Depenencies 2.12 Treebanks, created solely using UD 2.12 data... -
Czech PDT-C 1.0 Model for UDPipe 2 (2023-11-16)
Tokenizer, POS Tagger, Lemmatizer, and Parser model based on the PDT-C 1.0 treebank (https://hdl.handle.net/11234/1-3185). The model documentation including performance can be... -
Universal Dependencies 2.15 models for UDPipe 2 (2024-11-21)
Tokenizer, POS Tagger, Lemmatizer and Parser models for 147 treebanks of 78 languages of Universal Depenencies 2.15 Treebanks, created solely using UD 2.15 data... -
Universal Dependencies 2.5 Models for UDPipe (2019-12-06)
Tokenizer, POS Tagger, Lemmatizer and Parser models for 94 treebanks of 61 languages of Universal Depenencies 2.5 Treebanks, created solely using UD 2.5 data... -
CorpusExplorer
Software for corpus linguists and text/data mining enthusiasts. The CorpusExplorer combines over 45 interactive visualizations under a user-friendly interface. Routine tasks...