-
Tagger SentiOne - version 2
This is the second version of the morpho-syntactic tagger for the Polish language, adapted to UGC-processing. It has been enriched with some heuristics to improve its accuracy... -
MWE Wiek XIX
balucki_burmistrz_1887.txt balucki_murzyn_1875.txt balucki_przebudzeni_1864.txt beczkowska_bedzie_1897.txt beczkowska_droga_1898.txt beczkowska_gniezdzie_1899.txt... -
HaskPL
HaskPL is a Polish phraseological database designed for language professionals including linguists, language teachers, lexicographers, language materials developers and... -
MWE Dabrowska, Noce i dnie, Tom 3
Maria Dąbrowska -
KPWr Events
A set of documents annotated with event mentions extracted from the KPWr corpus. Process of annotation was described in the article: Marcińczuk, M., Oleksy et al. (2015). The... -
PELCRA for National Corpus of Polish Search Engine 2
The PELCRA for NKJP search engine 2 provides access to the full National Corpus of Polish dataset (over 1.5 billion word tokens). In addition to linguistically motivated corpus... -
KPWr dump r240
Dump of the Polish Corpus of Wrocław University of Technology (KPWr) containing a set of documents annotated with named entities and keywords. -
MWE Kraszewski
Józef Ignacy Kraszewski -
Constitution
Text of the constitution -
MWE Żuławski
Jerzy Żuławski -
Lilia
sample of historical texts -
PoLitBert_v32k_cos1_5_50k - Polish RoBERTa model
Polish RoBERTa model trained on Polish Wikipedia, Polish literature and Oscar. -
Clarin-PL Studio Corpus (EMU)
Polish speech corpus of read speech recorded in a studio. Contains many speakers, each reading a few dozen different sentences and a list of words with rare phonemes. Useful for... -
MWE Marrene
Waleria Marrené-Morzkowska -
Blogs_2018
Teksty z blogów książkowych -
The system of register labels in plWordNet v. 5 (Guidelines)
The pdf document contains guidelines of the description of the register of lexical units in the polish part of plWordNet -
Tekst reklam TVP ABC ver.2
tekst reklam emitowanych na tvp abc -
Big Data language model tagged with POS - RAW.
Big data language model tagged with POS - RAW -
plWordNet 4.2 (CLARIN-BIZ-START)
plWordNet (Słowosieć) from Juli 2020, used as the main resources for word sense disambiguation tasks in 2020-2022; the database includes also the mapping to Priceton WordNet 3.1... -
python-g419wikitools-1.0
Zestaw skryptów w języku Python do wygenerowania słownika odmiany fraz w oparciu o linki wewnętrzne Wikipedii. Efektem analizy dumpa Wikipedii jest zestaw plików, zawierających:...