CLARIN - Repositories

MWE Mostowicz

Tadeusz Dołęga-Mostowicz

WUT Relations Between Sentences Corpus

WUT Relations Between Sentences Corpus contains 2827 pairs of related sentences. Relationships are derived from Cross-document Structure Theory (CST), which enables...

Speech acts in message board posts

Corpus of texts from message boards used to annotating speech acts and local grammar.

Wroclaw Corpus of Consumer Reviews Sentiment (WCCRS)

Wroclaw Corpus of Consumer Reviews is a corpus of Polish reviews annotated with sentiment at the level of the whole text (text) and at the level of sentences (sentence) for the...

Integrated Parser

Integrated parser is an application that combines and normalizes outputs of several parsers for Polish. It is based on ENIAM processing stream extended with Polish Dependency...

Toposław 2 (2016-05-31)

Toposław 2 is an editor of multi-world unit inflection lexicons.

Pan Tadeusz

poemat

Składnica frazowa — a constituency treebank of Polish

Składnica frazowa is a constituency treebank of Polish. The treebank is a result of parsing Polish sentences with the syntactic parser Świgra. For every sentence, the parser...

Lexicalisation of Polish and English word combinations: two samples manually ...

We analysed over 350 Polish and English word combinations (multi-word expressions, MWEs). Half of the sample was drawn from traditional dictionaries, while the other half was...

"Fatalne jaja" Bułhakow

Story "Fatallne Jaja" Michaił Bułhakow

SuperMatrix

SuperMatrix is a system to support automatic extraction of semantic relations, based on the analysis of large text corpora. System was developed as a tool for expansion of...

Corpus2MWE

A CCL reader (Corpus2) with MWE detection.

MWE Wiek XX

berent_diogenes_1937.txt berent_kamienie_1918.txt berent_prochno_1903.txt dabrowska_nocednie1_1931.txt dabrowska_nocednie2_1932.txt dabrowska_nocednie3_1933.txt...

PELCRA PARL corpus

The corpus comprises 50 sampled recordings (12 hours) and manual transcriptions (ca. 101 00 word tokens) of parliamentary data.

Lalka - całość

a book in Polish by Bolesław Prus

Knowledge base of Polish conventionalized periphrastic nominal expressions

The resource includes free Periphraser export with a knowledge base of Polish conventionalized periphrastic nominal expressions (i.e. phrases headed by a noun) together with...

Polish Dependency Bank

Polish Dependency Bank (PDB) is the largest set of manually annotated dependency trees. PDB consists of more than 22K trees with 15.8 tokens per sentence on the average.

Description of nominal lexico-semantic relations in plWordNet 4.0 (Guidelines)

The pdf document contains guidelines of decription of Nouns in the Polish part of plWordNet.

Polish Spatial Texts (PST) 2.0

The extended version of Polish Spatial Text corpus. Texts derived from polish travel blogs manually annotated with spatial expressions. A spatial expression is a text fragment...

The LnNor Corpus: A spoken multilingual corpus of non-native and native Norwe...

The LnNor corpus was created as part of the data collection in two projects: CLIMAD (Crosslinguistic influence in multilingualism across domains: phonology and syntax) and ADIM...

4,938 datasets found