-
Provenance of annotation: a survey on multiple annotations for applications i...
Poster presented at the "Workshop on Data Provenance and Annotation in Computational Linguistics 2018" in Prague (co-located with TLT16). Abstract: It is... -
Replication data for: Pedagogical applications of learner corpora
This dataset contains replication and supplementary documentation of a systematic review of peer-reviewed research published in the period from 01 January 2014 to 08 August 2023... -
Atlas de Datos
"Atlas de Datos" es un catálogo sobre colecciones digitales y corpus de textos y documentos en español. -
CEN
Corpus of Economic News (CEN) contains 797 documents from Polish Wikipedia annotated with 65 categories of proper names in ccl format.... -
Parallel Corpora from Comparable Corpora tool
Script consists of 2 parts: article parser aligner Required software (install before using script): yalign additional Ubuntu packages: mongodb ipython python-nose... -
Wcrft test
Wcrft test -
Smyrna
Smyrna is a tool for building and searching own Polish corpora from HTML files. -
Polish-Lithuanian Parallel Corpus
Database -
Polish Corpus of Wrocław University of Technology 1.2 Korpus Języka Polskieg...
KPWr (Polish Corpus of Wrocław University of Technology, pol. Korpus Języka Polskiego Politechniki Wrocławskiej) is a corpus of written and spoken documents available on the... -
PDT-Vallex: Czech Valency lexicon linked to treebanks
The valency lexicon PDT-Vallex has been built in close connection with the annotation of the Prague Dependency Treebank project (PDT) and its successors (mainly the Prague...