-
Corpus of the Colloquial Polish Language
The Corpus of the Colloquial Polish Language (CCPL) is a UGC-based corpus tagged with morpho-syntactic features by the team of professional linguists from the Wrocław University... -
Wikinews_luty_marzec_2020
Test corpus _ 3_03_20 -
KPWr annotation guidelines - coreference
Coreference annotation guidelines describing the process of manual annotation of documents in Polish Corpus of Wrocław University of Technology (KPWr) -
Big Data language model - STEMMED - RAW data
Big data language model stemmed in RAW format -
Polish WSD Datasets
Data and code for the paper published at ICCS 2022: "A Unified Sense Inventory for Word Sense Disambiguation in Polish". The code is available at... -
1990_Skubiszewski
pierwsze expose MSZ III RP -
CorpoGrabber
CorpoGrabber: The Toolchain to Automatic Acquiring and Extraction of the Website Content Jan Kocoń, Wroclaw University of Technology CorpoGrabber is a pipeline of tools to get... -
AspectEmo 1.0: Multi-Domain Corpus of Consumer Reviews for Aspect-Based Senti...
AspectEmo 1.0 Corpus is an extended version of a publicly available PolEmo 2.0 corpus of Polish customer reviews, that was used in many projects on the use of different methods... -
MWE Świętochowski
Aleksander Świętochowski -
MWE Sienkiewicz, Ogniem i mieczem
Henryk Sienkiewicz -
Word Embeddings for Polish
Distributional language models for Polish trained on different corpora (KGR10, NKJP, Wikipedia). -
ELMo Embeddings for Polish
A model of ELMo embeddings for Polish language trained on large textual corpora (KGR10). To retrain the model please use the checkpoint and vocabulary files available at:... -
zmiany klimatu kraków
warsztaty w Krakowie - socjologia -
Liner2
Rozpoznaje nazwy własne w tekście polskim. -
Inforex
Inforex is a web-based system designed for managing and annotating text corpora on the semantic level including annotation of Named Entities (NE), anaphora, Word Sense... -
KGR10-RoBERTa
Polish RoBERTa model pre-trained on KGR10 corpora. -
Eesti-läti ehitusalane paralleelkorpus Estonian-Latvian Parallel Corpus of b...
korpus Parallel corpus of the info texts of building foams and sealants in Latvian and Estonian. -
Korpus przemówień przedwyborczych Baracka Obamy
Korpus tekstowy przemówień Baracka Obamy z lat 2006-2015. -
fronda
Some texts of fronda.pl
