Lexicalised and Non-lexicalized Multi-word Expressions inWordNet: a Cross-encoder Approach

Dataset

PID

Focusing on recognition of multi-word expressions (MWEs), we address the problem of recording MWEs in WordNet. In fact, not all MWEs recorded in that lexical database could with no doubt be considered as lexicalised (e.g. elements of wordnet taxonomy, quantifier phrases, certain collocations). In this paper, we use a cross-encoder approach to improve our earlier method of distinguishing between lexicalised and non-lexicalised MWEs found in WordNet using custom-designed rulebased and statistical approaches. We achieve F1-measure for the class of lexicalised word combinations close to 80%, easily beating two baselines (random and a majority class one). Language model also proves to be better than a feature-based logistic regression model.

Identifier
PID	http://hdl.handle.net/11321/985
Metadata Access	https://clarin-pl.eu/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:clarin-pl.eu:11321/985

Provenance
Creator	Maziarz, Marek; Grabowski, Łukasz; Piotrowski, Tadeusz; Rudnicka, Ewa; Piasecki, Maciej
Publisher	Global Wordnet Association
Publication Year	2023
Rights	Creative Commons - Attribution 4.0 International (CC BY 4.0); https://creativecommons.org/licenses/by/4.0/; CC
OpenAccess	true
Contact	clarin-pl(at)pwr.edu.pl

Representation
Language	English; Polish
Resource Type	languageDescription
Format	text/plain; charset=utf-8; application/pdf; downloadable_files_count: 1
Discipline	Linguistics