Dataset - B2FIND

Big data language model with part of speech tags stemmed in ARPA format

MWE Korzeniowski

Józef Korzeniowski

Cleaned Polish Oscar corpus (96M lines)

Cleaned Polish Oscar corpus (part: 96M lines, 3.49 GB). Data was prepared with a few cleaning heuristics: - remove sentences shorter than - remove non-polish sentences...

MWE Zarzycka

Irena Zarzycka

MWE Wiek XX

berent_diogenes_1937.txt berent_kamienie_1918.txt berent_prochno_1903.txt dabrowska_nocednie1_1931.txt dabrowska_nocednie2_1932.txt dabrowska_nocednie3_1933.txt...

Big data language model stemmed in ARPA format

Big data language model stemmed in ARPA format.

Big data language model with part of speech tags stemmed in RAW format

Big data language model stemmed with BPE in ARPA format

Corpus of the colloquial Polish language

The corpus of the colloquial Polish language is a UGC-based corpus tagged with morpho-syntactic features by the team of professional linguists from the Wrocław University of...

Sample20

prus_faraon_1897.txt balucki_przebudzeni_1864.txt reymont_komediantka_1896.txt zeromski_syzyfowe_1897.txt zapolska_kaska_1888.txt kraszewski_piast_1888.txt...

MWE Zapolska

Gabriela Zapolska

Poliqarp2

Poliqarp2 is a linguistic search engine, capable of searching through large corpora annotated on multiple levels. It is not an upgraded version of Poliqarp, it is a...

MWE Reymont

Władysław Reymont

MWE Żuławski

Jerzy Żuławski

MWE Marrene

Waleria Marrené-Morzkowska

MWE Wiek XIX

balucki_burmistrz_1887.txt balucki_murzyn_1875.txt balucki_przebudzeni_1864.txt beczkowska_bedzie_1897.txt beczkowska_droga_1898.txt beczkowska_gniezdzie_1899.txt...

MWE Deotyma

Deotyma

Żeromski

Stefan Żerromski - small corpus

MWE Mostowicz

Tadeusz Dołęga-Mostowicz

Polish corpus of plWordNet usage examples

Corpus of 83k usage examples taken from plWordNet 3.0. All annotated with specific sense. All published on open licences.

161 datasets found