Dataset - B2FIND

Wordnet-based Evaluation of Large Distributional Models for Polish

The paper presents construction of large scale test datasets for word embeddings on the basis of a very large wordnet. They were next applied for evaluation of word embedding...

Large Language Models for Research Data Management?! 2025 (LLMs4RDM 2025)

Research data management (RDM) has become an important discipline that enables researchers to effectively organise, preserve and share their research results. RDM is a new...

Hamburg Corpus of Polish in Germany (HamCoPoliG)

Audio recordings of German/Polish bilingual and Polish monolingual adults (16-46 years). Recordings of semi-spontaneous data (3 topics) and renarration of a picture story. The...

Hamburg Corpus of Polish in Germany (HamCoPoliG)

This data set contains the same data as version 1.0.0. The only difference is that the data is split into smaller archives to make it easier to download. You can either...

Hamburg Corpus of Polish in Germany (HamCoPoliG)

Original Data: Audio recordings of German/Polish bilingual and Polish monolingual adults (16-46 years). Recordings of semi-spontaneous data (3 topics) and renarration of a...

Hamburg Corpus of Polish in Germany (HamCoPoliG)

This corpus version is deprecated for version 0.2.

Replication data for: V-temporal adverbials in Slavic

The database includes 271 Russian examples and their equivalents in Ukrainian, Belarusian, Polish and Czech. The data were culled from the ParaSol parallel corpus (see...

Replication Data for: Accusative of Negation in ‘Borderland’ Polish

These are the data for a journal article on 'Accusative of Negation in 'Borderland' Polish'. The abstract of the article is below. The data consist of the annotated list of...

Khresmoi Query Translation Test Data 2.0

This package contains data sets for development and testing of machine translation of medical queries between Czech, English, French, German, Hungarian, Polish, Spanish ans...

Khresmoi Summary Translation Test Data 2.0

This package contains data sets for development (Section dev) and testing (Section test) of machine translation of sentences from summaries of medical articles between Czech,...

Community Interpreting Database Pilot Corpus (ComInDat)

Audio and video recordings of various types of community interpreted discourse (doctor-patient communication, simulated doctor-patient communication, courtroom communication) in...

EXMARaLDA Demo corpus 1.1

A selection of short audio and video recordings in various languages to be used for instruction or demonstration of the EXMARaLDA system. The EXMARaLDA Demo Corpus is a small...

SimDiK

Data from the SimDiK project.

Grammatik des Polnischen

Das vorliegende Handbuch enthält eine umfassende Beschreibung der modernen polnischen Standardsprache und richtet sich an Lerner des Polnischen als Fremdsprache,...

Word embeddings for Polish (KGR10, Fasttext binary) kgr10_fasttext_bin_v1

Distributional language model (binary) for Polish trained on KGR10 using Fasttext (vector dimension: 100).

PolEmo 2.0 Sentiment Analysis Dataset for CoNLL

PolEmo 2.0: Corpus of Multi-Domain Consumer Reviews, evaluation data for article presented at CoNLL Citation: @inproceedings{kocon-etal-2019-multi, title = "Multi-Level...

Polish-Lithuanian Parallel Corpus

Database

ChunkRel WS

ChunkRel-WS is a prototype service for recognition of three syntactic relations between chunks. The service may be run against plain text (input format: text), then the...

Polish-Russian Parallel Corpus

POLFIE: an LFG grammar of Polish

POLFIE is an LFG grammar of Polish implemented in the XLE system (Xerox Linguistic Environment). POLFIE has been developed at the Institute of Computer Science, Polish Academy...

56 datasets found