Dataset - B2FIND

Replication data for: Pedagogical applications of learner corpora

This dataset contains replication and supplementary documentation of a systematic review of peer-reviewed research published in the period from 01 January 2014 to 08 August 2023...

PDT-Vallex: Czech Valency lexicon linked to treebanks

The valency lexicon PDT-Vallex has been built in close connection with the annotation of the Prague Dependency Treebank project (PDT) and its successors (mainly the Prague...

Atlas de Datos

"Atlas de Datos" es un catálogo sobre colecciones digitales y corpus de textos y documentos en español.

Provenance of annotation: a survey on multiple annotations for applications i...

Poster presented at the "Workshop on Data Provenance and Annotation in Computational Linguistics 2018" in Prague (co-located with TLT16). Abstract: It is...

Polish-Lithuanian Parallel Corpus

Database

Smyrna

Smyrna is a tool for building and searching own Polish corpora from HTML files.

Parallel Corpora from Comparable Corpora tool

Script consists of 2 parts: article parser aligner Required software (install before using script): yalign additional Ubuntu packages: mongodb ipython python-nose...

Polish Corpus of Wrocław University of Technology 1.2 Korpus Języka Polskieg...

KPWr (Polish Corpus of Wrocław University of Technology, pol. Korpus Języka Polskiego Politechniki Wrocławskiej) is a corpus of written and spoken documents available on the...

Mochnacki

korpus tekstów Mochnackiego

Polish Corpus of Wrocław University of Technology 1.1 Korpus Języka Polskieg...

KPWr (Polish Corpus of Wrocław University of Technology, pl. Korpus Języka Polskiego Politechniki Wrocławskiej) is a corpus of written and spoken documents available on the...

Wcrft test

CEN

Corpus of Economic News (CEN) contains 797 documents from Polish Wikipedia annotated with 65 categories of proper names in ccl format....

12 datasets found