Discriminating Homonymy from Polysemy in Wordnets: English, Spanish and Polish Nouns

Dataset

PID

We propose a novel method of homonymy-polysemy discrimination for three Indo-European Languages (English, Spanish and Polish). Support vector machines and LASSO logistic regression were successfully used in this task, outperforming baselines. The feature set utilised lemma properties, gloss similarities, graph distances and polysemy patterns. The proposed ML models performed equally well for English and the other two languages (constituting testing data sets). The algorithms not only ruled out most cases of homonymy but also were efficacious in distinguishing between closer and indirect semantic relatedness.

Identifier
PID	http://hdl.handle.net/11321/974
Metadata Access	https://clarin-pl.eu/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:clarin-pl.eu:11321/974

Provenance
Creator	Janz, Arkadiusz; Maziarz, Marek
Publisher	GWC
Publication Year	2021
Rights	Creative Commons - Attribution 4.0 International (CC BY 4.0); https://creativecommons.org/licenses/by/4.0/; CC
OpenAccess	true
Contact	clarin-pl(at)pwr.edu.pl

Representation
Language	English; Polish; Spanish; Castilian
Resource Type	languageDescription
Format	text/plain; charset=utf-8; application/pdf; downloadable_files_count: 1
Discipline	Linguistics