-
PLWordNet to Sumo mapping
Resul of mapping plWordNet onto SUMO ontology. -
POE: Microcorpus of 20th century Polish poetry
Microcorpus of 20th century Polish poetry -
MWE Dmochowska
Emma Dmochowska -
Andersen fairy tales
Some fairy tales of Hans Christian Andersen. -
Guidelines for Emotive Annotation in plWordNet 4.0
Guidelines for emotive annotation in plWordNet 4.0. The set consists of four documents dedicated to emotive annotation of four parts of speech - verb, adverb, noun and adjective. -
MWE Kaczkowski
Zygmunt Kaczkowski -
Periphraser
Periphraser is a tool for storing and presenting knowledge base of conventionalized periphrastic nominal expressions (i.e. phrases headed by a noun) together with their... -
Cleaned Polish Oscar corpus (128M above lines)
Cleaned Polish Oscar corpus (part: 128M above lines, 1.93 GB). Data was prepared with a few cleaning heuristics: - remove sentences shorter than - remove non-polish... -
Speech Recognition System for Polish: Polish Film Chronicles
This resource contains dockerized models and scripts of an automatic speech recognition system for Polish trained on recording of the Polish Film Chronicles. The system is based... -
NamEnSim
NamEnSim (Named Entity Similarity Function) is a tool to evaluate the value of the word pair similarity, using heterogeneous named entity similarity function. This tool allows... -
Big data language model tagged with POS - ARPA
Big Data language model tagged with POS - ARPA -
Big Data language model with grammatical groups - RAW
Big Data Language model tagged with grammatical groups in RAW format. -
PolEmo 2.0 Sentiment Analysis Dataset for CoNLL
PolEmo 2.0: Corpus of Multi-Domain Consumer Reviews, evaluation data for article presented at CoNLL Citation: @inproceedings{kocon-etal-2019-multi, title = "Multi-Level... -
tokenizer
Tokenizer is a tool with wich one can design dedicated tokenizers for texts from some domain of interest. -
Sample20
prus_faraon_1897.txt balucki_przebudzeni_1864.txt reymont_komediantka_1896.txt zeromski_syzyfowe_1897.txt zapolska_kaska_1888.txt kraszewski_piast_1888.txt... -
Speech activity annotation for a subset of the Clarin-PL studio corpus
This is a hand-checked annotation of speech activity within a subset of the Clarin-PL studio corpus, containing 20 session with 619 recordings. This submission does not contain... -
Register of multi-word expressions deleted from plWordNet after verification
A dataset of multi-word expressions deleted from plWordNet after manual verification of their lexicality status. -
MWE Sienkiewicz, Rodzina Połanieckich
Henryk Sienkiewicz -
Transcriptions of the Polish Film Chronicles (Polska Kronika Filmowa) - years...
This is the orthographic transcription of the audio of the Polish Film Chronicles (Polska Kronika Filmowa - PKF) between the years 1945-1962. The transcription is mostly... -
Mining blogs
Blogi dotyczące górnictwa. Zakres: 01.01.2015 - 20.06.2015. Wpisy wraz z komentarzami.