-
ENIAMtoolkit
ENIAMtoolkit is a collection of libraries that: - perform tokenization, lemmatization, part of speech tagging; - detect MWE and abbreviations; - split text into sentences. -
Street name changes in Poznań, Słubice and Zbąszyń, Poland 1916-2018
The corpus presents a historical overview of street and place (park, bridge, square) name changes in the years 1916-2018 for three Polish cities: Poznań, Słubice and Zbąszyń.... -
POLFIE-OT: an LFG grammar of Polish with OT marks
POLFIE-OT is a version of POLFIE, an LFG grammar of Polish implemented in the XLE system (Xerox Linguistic Environment), enriched with OT (Optimality Theory) constraints for the... -
Wiki train - 34 categories
Wikipedia, 34 kategorie - zbiór do uczenia klasyfikatora -
WCRFT WebLichtService
WCRFT service for WebLicht -
Vector Extractor
Collocations presented are based on co-occurrences of a selected noun with several features describing it and linked with it by syntactic dependencies. The recognised features... -
Big Data language model in FastText CBOW format
Big Data language model in FastText CBOW format -
SpakowanesermonyEN
Sermons -
Open license texts sample
Sample corpus of texts distributed under open license. It consists of 20 documents in TXT, DOCX, DOC or ODT format. -
Plumper
Ontology mapper. Mapping plWordNet onto SUMO ontology. -
Polimorf
PoliMorf is a morphological dictionary for Polish resulting from the standardization and merger of Morfeusz SGJP and Morfologik. The present version includes extended... -
MultiEmo: Multilingual, Multilevel, Multidomain Sentiment Analysis Corpus of ...
MultiEmo, a new benchmark data set for the multilingual sentiment analysis task including 11 languages. The collection contains consumer reviews from four domains: medicine,... -
Polish-Ukrainian Parallel Corpus
Polish-Ukrainian Parallel Corpus -
Corpus_Sienkiewicz_Novels
Sienkiewicz Novels -
diachronic1
HISTORY -
PoLitBert_v32k_cos1_2_50k - Polish RoBERTa model
Polish RoBERTa model trained on Polish Wikipedia, Polish literature and Oscar. -
WordnetLoom
WordnetLoom – is an wordnet editor application built for the needs of the construction of a the largest Polish wordnet called plWordNet. WordnetLoom provides two means of... -
The system of the diagnostics in plWordNet
The pdf-document contains the description of the most frequent, regular errors in plWordNet and rules of them semi-automatic correction. -
TreeHopper (TreeLSTM): wydźwięk na poziomie zdań i fraz
A Tree-LSTM-based dependency tree sentiment labeler -
CEN
Corpus of Economic News (CEN) contains 797 documents from Polish Wikipedia annotated with 65 categories of proper names in ccl format....
