-
OK, Computer, what are these books about? - data files
The core of this experiment is the use of the entity-fishing algorithm, as created and deployed by DARIAH. In the most simple terms: it scans texts for terms that can be linked... -
Context-Aware Representations for Knowledge Base Relation Extraction
We provide a subcorpus of Wikipedia that was annotated with Wikidata relations using a distant supervision procedure. The corpus contains two types of annotations: entities and... -
CEN
Corpus of Economic News (CEN) contains 797 documents from Polish Wikipedia annotated with 65 categories of proper names in ccl format.... -
Wiki train - 34 categories
Wikipedia, 34 kategorie - zbiór do uczenia klasyfikatora -
Wiki test - 34 categories
Wikipedia, 34 kategorie - zbiór do testów klasyfikatora -
Wikinews korpus próbny
Set of files containing wikinews -
Próbny korpus
wikinewsy -
Wikipedia Infobox Mapping PL
Mapping between infobox attributes used in Polish Wikipedia and KPWr named entity schema. -
Parallel Corpora from Comparable Corpora tool
Script consists of 2 parts: article parser aligner Required software (install before using script): yalign additional Ubuntu packages: mongodb ipython python-nose... -
wikinewsy
wikinewsy -
Korpus test - Wikinews
Testowa baza na zajęcia -
W2C – Web to Corpus – tool
A tool used to build multilingual corpora from wikipedia. Download the web pages, convert them to plain text, identify language, etc. A set of 120 corpora collected using this...