WordNet-based Data Augmentation for Hybrid WSD Models

Dataset

PID

Recent advances in Word Sense Disambiguation suggest neural language models can be successfully improved by incorporating knowledge base structure. Such class of models are called hybrid solutions. We propose a method of improving hybrid WSD models by harnessing data augmentation techniques and bilingual training. The data augmentation consist of structure augmentation using interlingual connections between wordnets and text data augmentation based on multilingual glosses and usage examples. We utilise language-agnostic neural model trained both with SemCor and Princeton WordNet gloss and example corpora, as well as with Polish WordNet glosses and usage examples. This augmentation technique proves to make well-known hybrid WSD architecture to be competitive, when compared to current State-of-the-Art models, even more complex.

Identifier
PID	http://hdl.handle.net/11321/977
Metadata Access	https://clarin-pl.eu/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:clarin-pl.eu:11321/977

Provenance
Creator	Janz, Arkadiusz; Maziarz, Marek
Publisher	Global Wordnet Association
Publication Year	2023
Rights	Creative Commons - Attribution 4.0 International (CC BY 4.0); https://creativecommons.org/licenses/by/4.0/; CC
OpenAccess	true
Contact	clarin-pl(at)pwr.edu.pl

Representation
Language	English
Resource Type	languageDescription
Format	text/plain; charset=utf-8; application/pdf; downloadable_files_count: 1
Discipline	Linguistics