-
Tourism Corpus TURK 3.0
The Tourism Corpus TURK 3.0 is a multilingual corpus of tourism-related texts in Slovenian, accompanied by some texts (about 6% of the corpus) in English, Italian and German.... -
CooccurrenceFieldSampler (CFS)
The CooccurrenceFieldSampler (CFS) was developed for sampling from corpora to facilitate lexicographical data analysis. It works with corpora from different sources, text types... -
Lithuanian Hate Speech Corpus v.1
This corpus consists of (1) examples of hate speech based on ethnicity, nationality, or race, and (2) a collection of neutral comments, including both general comments and... -
WordNet-based Data Augmentation for Hybrid WSD Models
Recent advances in Word Sense Disambiguation suggest neural language models can be successfully improved by incorporating knowledge base structure. Such class of models are... -
Discriminating Homonymy from Polysemy in Wordnets: English, Spanish and Polis...
We propose a novel method of homonymy-polysemy discrimination for three Indo-European Languages (English, Spanish and Polish). Support vector machines and LASSO logistic... -
Testing Zipf’s meaning-frequency law with wordnets as sense inventories
According to George K. Zipf, more frequent words have more senses. We have tested this law using corpora and wordnets of English, Spanish, Portuguese, French, Polish, Japanese,... -
Extraction and description of multi-word lexical units in plWordNet 3.0
In this paper, we present methods of extraction of multi-word lexical units (MWLUs) from large text corpora and their description in plWordNet 3.0. MWLUs are filtered from... -
Enriching plWordNet with morphology
In the paper, we present the process of adding morphological information to the Polish WordNet (plWordNet). We describe the reasons for this connection and the intuitions behind... -
Wordnet – a Basic Resource for Natural Language Processing: the Case of plWor...
This paper presents a wide scope of wordnet applications on the example of applications of plWordNet – a wordnet of Polish. Wordnets are large lexical-semantic databases... -
plWordNet 4.1 – a Linguistically Motivated, Corpus-based Bilingual Resource
The paper presents the latest release of the Polish WordNet, namely plWordNet 4.1. The most significant developments since 3.0 version include new relations for nouns and verbs,... -
Terminology in WordNet and in plWordNet
We examine the strategies of organizing terminological information in WordNet, and describe an analogous strategy of adding terminological senses of lexical units to plWordNet,... -
Addenda to the inventory of female names in Słowosieć: The case of biskupka ‘...
Due to the dynamic social discussion and the observed increase in the use of feminatives, we deemed it appropriate to modify the current way of describing these units in the... -
The lexicographic description of feminine forms in plWordNet: the current sta...
The aim of the study is to present a method of describing feminine forms (nouns referring to humans with female gender) in plWordNet and to indicate possible directions of its... -
MultiCo-Hub: a corpus of multimodal enrichments with motion-trajectory annota...
MultiCo-Hub is a multimodal dataset including 11 zipped subsets (henceforth: sessions) of time-aligned audio, video and motion-capture–derived BVH data, together with... -
Slovene instruction-following dataset for large language models GaMS-Instruct...
GaMS-Instruct-MED is an instruction-following dataset designed to fine-tune Slovene large language models to follow instructions in the medical domain. It consists of units of... -
Parallel sense-annotated corpus ELEXIS-WSD 1.3
ELEXIS-WSD is a parallel sense-annotated corpus in which content words (nouns, adjectives, verbs, and adverbs) have been assigned senses. Version 1.3 contains sentences for 10... -
Corpus of Transcriptions - part 2
The second part of the Corpus of Transcriptions contains phonemic transcriptions of a short passage from Lecumberri and Maidment (2000, p. 78) performed by the undergraduate... -
plWordNet 5.0 – challenges of a life-long wordnet development process
The construction of plWordNet began in 2005 and has been continued since then. In this paper we present the latest 5.0 version and describe the challenges connected with a... -
DiPSS - longitudinal corpus of drift in Polish students of Spanish
The DiPSS corpus (part 1) is a longitudinal speech resource documenting the phonetic productions of L1 Polish students learning L2 English and L3 Spanish. It includes recordings... -
HANOI corpus and tool for analysis of note-taking of conference interpreters
HANOI is a resource for understanding the process of consecutive interpreting through the analysis of the note-taking process. Each data package is a record of an interpretation...
