CLARIN - Repositories

Introduction to the special issue: On wordnets and relations

The present paper is concerned with the issues of wordnets and relations.

Multisłownik: Linking plWordNet-based Lexical Data for Lexicography and Educa...

Multisłownik is an automated integrator of Polish lexical data retrieved from multiple available online sources intended to be used in various scenarios requiring access to such...

WordnetLoom – a MultilingualWordnet Editing System Focused on Graph-based Pre...

The paper presents a new re-built and expanded, version 2.0 of WordnetLoom – an open wordnet editor. It facilitates work on a multilingual system of wordnets, is based on...

A collaborative system for building and maintaining wordnets

A collaborative system for wordnet construction and maintenance is presented. Its key modules includeWordnetLoom editor, Wordnet Tracker and JavaScript Graph. They offer a...

The chicken-and-egg problem in wordnet design: synonymy, synsets and constitu...

Wordnets are built of synsets, not of words. A synset consists of words. Synonymy is a relation between words. Words go into a synset because they are synonyms. Later, a wordnet...

Expanding WordNet with Gloss and Polysemy Links for Evocation Strength Recogn...

Evocation — a phenomenon of sense associations going beyond standard (lexico)-semantic relations — is difficult to recognise for natural language processing systems. Machine...

Towards Mapping Thesauri onto plWordNet

plWordNet, the wordnet of Polish, has become a very comprehensive description of the Polish lexical system. This paper presents a plan of its semi-automated integration with...

Testing agreement between lexicographers: A case of homonymy and polysemy

In this paper we compare Oxford Lexico and Merriam Webster dictionaries with Princeton WordNet with respect to the description of semantic (dis)similarity between polysemous and...

Registers in the System of Semantic Relations in plWordNet

Lexicalised concepts are represented in wordnets by word-sense pairs. The strength of markedness is one of the factors which influence word use. Stylistically unmarked words are...

plWordNet as the Cornerstone of a Toolkit of Lexico-semantic Resources

A wordnet is many things to many people: a graph of inter-related lexicalised concepts, a taxonomy, a thesaurus, and so on. A wordnet makes good sense as the mainstay of any...

Lexicalised and Non-lexicalized Multi-word Expressions inWordNet: a Cross-enc...

Focusing on recognition of multi-word expressions (MWEs), we address the problem of recording MWEs in WordNet. In fact, not all MWEs recorded in that lexical database could with...

Adverbs in plWordNet: Theory and Implementation

Adverbs are seldom well represented in wordnets. Princeton WordNet, for example, derives from adjectives practically all its adverbs and whatever involvement they have. GermaNet...

Propagation of emotions, arousal and polarity inWordNet using Heterogeneous S...

In this paper we present a novel method for emotive propagation in a wordnet based on a large emotive seed. We introduce a sense-level emotive lexicon annotated with polarity,...

Context-sensitive Sentiment Propagation inWordNet

In this paper we present a comprehensive overview of recent methods of the sentiment propagation in a wordnet. Next, we propose a fully automated method called Classifier-based...

Neural Language Models vs Wordnet-based Semantically Enriched Representation ...

Neural language models, including transformer-based models, that are pretrained on very large corpora became a common way to represent text in various tasks, including...

Word Sense Disambiguation Based on Iterative Activation Spreading with Contex...

Many knowledge-based solutions were proposed to solve Word Sense disambiguation (WSD) problem with limited annotated resources. Such WSD algorithms are able to cover very large...

epic-uds

This dataset has no description

Tourism Corpus TURK 3.0

The Tourism Corpus TURK 3.0 is a multilingual corpus of tourism-related texts in Slovenian, accompanied by some texts (about 6% of the corpus) in English, Italian and German....

CooccurrenceFieldSampler (CFS)

The CooccurrenceFieldSampler (CFS) was developed for sampling from corpora to facilitate lexicographical data analysis. It works with corpora from different sources, text types...

Lithuanian Hate Speech Corpus v.1

This corpus consists of (1) examples of hate speech based on ethnicity, nationality, or race, and (2) a collection of neutral comments, including both general comments and...

4,938 datasets found