653 datasets found

Language: Polish

Filter Results
  • Tagger SentiOne - version 2

    This is the second version of the morpho-syntactic tagger for the Polish language, adapted to UGC-processing. It has been enriched with some heuristics to improve its accuracy...
  • MWE Wiek XIX

    balucki_burmistrz_1887.txt balucki_murzyn_1875.txt balucki_przebudzeni_1864.txt beczkowska_bedzie_1897.txt beczkowska_droga_1898.txt beczkowska_gniezdzie_1899.txt...
  • HaskPL

    HaskPL is a Polish phraseological database designed for language professionals including linguists, language teachers, lexicographers, language materials developers and...
  • KPWr Events

    A set of documents annotated with event mentions extracted from the KPWr corpus. Process of annotation was described in the article: Marcińczuk, M., Oleksy et al. (2015). The...
  • PELCRA for National Corpus of Polish Search Engine 2

    The PELCRA for NKJP search engine 2 provides access to the full National Corpus of Polish dataset (over 1.5 billion word tokens). In addition to linguistically motivated corpus...
  • KPWr dump r240

    Dump of the Polish Corpus of Wrocław University of Technology (KPWr) containing a set of documents annotated with named entities and keywords.
  • MWE Kraszewski

    Józef Ignacy Kraszewski
  • Constitution

    Text of the constitution
  • MWE Żuławski

    Jerzy Żuławski
  • Lilia

    sample of historical texts
  • PoLitBert_v32k_cos1_5_50k - Polish RoBERTa model

    Polish RoBERTa model trained on Polish Wikipedia, Polish literature and Oscar.
  • Clarin-PL Studio Corpus (EMU)

    Polish speech corpus of read speech recorded in a studio. Contains many speakers, each reading a few dozen different sentences and a list of words with rare phonemes. Useful for...
  • MWE Marrene

    Waleria Marrené-Morzkowska
  • Blogs_2018

    Teksty z blogów książkowych
  • The system of register labels in plWordNet v. 5 (Guidelines)

    The pdf document contains guidelines of the description of the register of lexical units in the polish part of plWordNet
  • Tekst reklam TVP ABC ver.2

    tekst reklam emitowanych na tvp abc
  • Big Data language model tagged with POS - RAW.

    Big data language model tagged with POS - RAW
  • plWordNet 4.2 (CLARIN-BIZ-START)

    plWordNet (Słowosieć) from Juli 2020, used as the main resources for word sense disambiguation tasks in 2020-2022; the database includes also the mapping to Priceton WordNet 3.1...
  • python-g419wikitools-1.0

    Zestaw skryptów w języku Python do wygenerowania słownika odmiany fraz w oparciu o linki wewnętrzne Wikipedii. Efektem analizy dumpa Wikipedii jest zestaw plików, zawierających:...
You can also access this registry using the API (see API Docs).