Dataset - B2FIND

WordNet-based Data Augmentation for Hybrid WSD Models

Recent advances in Word Sense Disambiguation suggest neural language models can be successfully improved by incorporating knowledge base structure. Such class of models are...

Discriminating Homonymy from Polysemy in Wordnets: English, Spanish and Polis...

We propose a novel method of homonymy-polysemy discrimination for three Indo-European Languages (English, Spanish and Polish). Support vector machines and LASSO logistic...

Testing Zipf’s meaning-frequency law with wordnets as sense inventories

According to George K. Zipf, more frequent words have more senses. We have tested this law using corpora and wordnets of English, Spanish, Portuguese, French, Polish, Japanese,...

Extraction and description of multi-word lexical units in plWordNet 3.0

In this paper, we present methods of extraction of multi-word lexical units (MWLUs) from large text corpora and their description in plWordNet 3.0. MWLUs are filtered from...

Enriching plWordNet with morphology

In the paper, we present the process of adding morphological information to the Polish WordNet (plWordNet). We describe the reasons for this connection and the intuitions behind...

Wordnet – a Basic Resource for Natural Language Processing: the Case of plWor...

This paper presents a wide scope of wordnet applications on the example of applications of plWordNet – a wordnet of Polish. Wordnets are large lexical-semantic databases...

plWordNet 4.1 – a Linguistically Motivated, Corpus-based Bilingual Resource

The paper presents the latest release of the Polish WordNet, namely plWordNet 4.1. The most significant developments since 3.0 version include new relations for nouns and verbs,...

Terminology in WordNet and in plWordNet

We examine the strategies of organizing terminological information in WordNet, and describe an analogous strategy of adding terminological senses of lexical units to plWordNet,...

Addenda to the inventory of female names in Słowosieć: The case of biskupka ‘...

Due to the dynamic social discussion and the observed increase in the use of feminatives, we deemed it appropriate to modify the current way of describing these units in the...

The lexicographic description of feminine forms in plWordNet: the current sta...

The aim of the study is to present a method of describing feminine forms (nouns referring to humans with female gender) in plWordNet and to indicate possible directions of its...

Slovene instruction-following dataset for large language models GaMS-Instruct...

GaMS-Instruct-MED is an instruction-following dataset designed to fine-tune Slovene large language models to follow instructions in the medical domain. It consists of units of...

Parallel sense-annotated corpus ELEXIS-WSD 1.3

ELEXIS-WSD is a parallel sense-annotated corpus in which content words (nouns, adjectives, verbs, and adverbs) have been assigned senses. Version 1.3 contains sentences for 10...

plWordNet 5.0 – challenges of a life-long wordnet development process

The construction of plWordNet began in 2005 and has been continued since then. In this paper we present the latest 5.0 version and describe the challenges connected with a...

Czech PDT-C 2.0 Model for UDPipe 2 (2025-10-25)

Tokenizer, POS Tagger, Lemmatizer, and Parser model based on the PDT-C 2.0 treebank (http://hdl.handle.net/11234/1-5813). The model documentation including performance can be...

Universal Dependencies 2.17 models for UDPipe 2 (2025-11-25)

Tokenizer, POS Tagger, Lemmatizer and Parser models for 169 treebanks of 93 languages of Universal Depenencies 2.17 Treebanks, created solely using UD 2.17 data...

Slovene-Japanese Learner's Dictionary sloJa 1.1

The Slovenian-Japanese online dictionary for Slovenian speaking learners of Japanese was compiled by extracting and converting the Japanese-Slovenian dictionary jaSlo 3.1...

Dataset of annotated headword-synonym-distractor triplets SYNDIST

The dataset contains 51,023 headword-synonym-distractor triplets for 5,000 headwords. Distractor is defined as an incorrect answer/alternative to synonym, which can be similar...

SFU Opinion and Comments Corpus (SOCC) for NoSketch Engine

The SFU Opinion and Comments Corpus (SOCC) is a corpus for the analysis of online news comments. It contains opinionated articles and comments. It was tagged using TreeTagger...

StarwarsNER French Italian Corpus - sample

The StarwarsNER French Italian Corpus - sample is a multilingual benchmark resource for Named Entity Recognition (NER) in the wastewater and stormwater management domain. It...

StarwarsNER French Italian Corpus - sample

The StarwarsNER French Italian Corpus - sample is a multilingual benchmark resource for Named Entity Recognition (NER) in the wastewater and stormwater management domain. It...

1,439 datasets found