CLARIN - Repositories

Liner2 events model

Liner2 model for event and event relation recognition

ENIAM

ENIAM: Categorial Syntactic-Semantic Parser for Polish

PoLitBert_v50k_linear_50k - Polish RoBERTa model

Polish RoBERTa model trained on Polish Wikipedia, Polish literature and Oscar.

Big Data language model in Word2Vec CBOW format.

Assamese Multi Word Expressions

Multiword Expressions are sequence of words, separated by space delimiter (or any) which determines a unique meaning instead of words' individual meanings. A list comprising of...

Korpus testowy - ludzie

Korpus osób testowy

WoSeDon

WoSeDon is a tool for Word Sense Disambiguation. It works for polish texts and as a source of possible senses using plWordNet.

EU Territorial Policy Documents 2007-2016 (partitioned)

Corpus of the key documents of the EU territorial policy 2007-2016.

Polish corpus of plWordNet usage examples

Corpus of 83k usage examples taken from plWordNet 3.0. All annotated with specific sense. All published on open licences.

Świgra — a parser of Polish

Świgra is a parser of Polish generating constituency trees using a DCG style grammar stemming from Marek Świdziński’s grammar “Gramatyka formalna języka polskiego” (1992). The...

DN XXI 213 (trial corpus 2)

This is a trial corpus.

Indexes for djview4poliqarp

This is the archive of the mercurial repositories formerly available at https://bitbucket.org/jsbien/. They contain indexes to various resources in the DjVu format, in...

Serel (WS)

Serel is a Python framework for recognition relations between annotations in text.

International Women's Day Corpus

The corpus contains articles form the daily "Trybuna Ludu" from years 1949-1956.The articles dealt with the situation of women, they were especially concerned with the...

CorpoGrabber-Desktop: The Toolchain to Automatic Acquiring and Extraction of ...

Desktop version of CorpoGrabber CLI

MWE Sygietyński

Antoni Sygietyński

NLP Web services and NLP workflow engine

Web based system for natural language processing of texts in Polish. It allows running complex workflows of language and machine learning tools. Making it avaliable via REST Web...

New Gospels

Nowe Ateny

KGR10 FastText Polish word embeddings

Distributional language model (both textual and binary) for Polish (word embeddings) trained on KGR10 corpus (over 4 billion of words) using Fasttext with the following variants...

MWE 10 Największych

dabrowska_nocednie3_1933.txt prus_emancypantki_1894.txt sienkiewicz_ogniem_1884.txt kaczkowski_grob_1857.txt prus_faraon_1897.txt sienkiewicz_rodzina_1894.txt...

4,938 datasets found