WordNet-based Data Augmentation for Hybrid WSD Models

PID

Recent advances in Word Sense Disambiguation suggest neural language models can be successfully improved by incorporating knowledge base structure. Such class of models are called hybrid solutions. We propose a method of improving hybrid WSD models by harnessing data augmentation techniques and bilingual training. The data augmentation consist of structure augmentation using interlingual connections between wordnets and text data augmentation based on multilingual glosses and usage examples. We utilise language-agnostic neural model trained both with SemCor and Princeton WordNet gloss and example corpora, as well as with Polish WordNet glosses and usage examples. This augmentation technique proves to make well-known hybrid WSD architecture to be competitive, when compared to current State-of-the-Art models, even more complex.

Identifier
PID http://hdl.handle.net/11321/977
Metadata Access https://clarin-pl.eu/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:clarin-pl.eu:11321/977
Provenance
Creator Janz, Arkadiusz; Maziarz, Marek
Publisher Global Wordnet Association
Publication Year 2023
Rights Creative Commons - Attribution 4.0 International (CC BY 4.0); https://creativecommons.org/licenses/by/4.0/; CC
OpenAccess true
Contact clarin-pl(at)pwr.edu.pl
Representation
Language English
Resource Type languageDescription
Format text/plain; charset=utf-8; application/pdf; downloadable_files_count: 1
Discipline Linguistics