-
Big data language model with part of speech tags stemmed in ARPA format
Big data language model with part of speech tags stemmed in ARPA format -
MWE Korzeniowski
Józef Korzeniowski -
Cleaned Polish Oscar corpus (96M lines)
Cleaned Polish Oscar corpus (part: 96M lines, 3.49 GB). Data was prepared with a few cleaning heuristics: - remove sentences shorter than - remove non-polish sentences... -
MWE Zarzycka
Irena Zarzycka -
MWE Wiek XX
berent_diogenes_1937.txt berent_kamienie_1918.txt berent_prochno_1903.txt dabrowska_nocednie1_1931.txt dabrowska_nocednie2_1932.txt dabrowska_nocednie3_1933.txt... -
Big data language model stemmed in ARPA format
Big data language model stemmed in ARPA format. -
Big data language model with part of speech tags stemmed in RAW format
Big data language model with part of speech tags stemmed in RAW format -
Big data language model stemmed with BPE in ARPA format
Big data language model stemmed with BPE in ARPA format -
Corpus of the colloquial Polish language
The corpus of the colloquial Polish language is a UGC-based corpus tagged with morpho-syntactic features by the team of professional linguists from the Wrocław University of... -
Sample20
prus_faraon_1897.txt balucki_przebudzeni_1864.txt reymont_komediantka_1896.txt zeromski_syzyfowe_1897.txt zapolska_kaska_1888.txt kraszewski_piast_1888.txt... -
MWE Zapolska
Gabriela Zapolska -
Poliqarp2
Poliqarp2 is a linguistic search engine, capable of searching through large corpora annotated on multiple levels. It is not an upgraded version of Poliqarp, it is a... -
MWE Reymont
Władysław Reymont -
MWE Żuławski
Jerzy Żuławski -
MWE Marrene
Waleria Marrené-Morzkowska -
MWE Wiek XIX
balucki_burmistrz_1887.txt balucki_murzyn_1875.txt balucki_przebudzeni_1864.txt beczkowska_bedzie_1897.txt beczkowska_droga_1898.txt beczkowska_gniezdzie_1899.txt... -
MWE Deotyma
Deotyma -
Żeromski
Stefan Żerromski - small corpus -
MWE Mostowicz
Tadeusz Dołęga-Mostowicz -
Polish corpus of plWordNet usage examples
Corpus of 83k usage examples taken from plWordNet 3.0. All annotated with specific sense. All published on open licences.