Lexicalised and Non-lexicalized Multi-word Expressions inWordNet: a Cross-encoder Approach

PID

Focusing on recognition of multi-word expressions (MWEs), we address the problem of recording MWEs in WordNet. In fact, not all MWEs recorded in that lexical database could with no doubt be considered as lexicalised (e.g. elements of wordnet taxonomy, quantifier phrases, certain collocations). In this paper, we use a cross-encoder approach to improve our earlier method of distinguishing between lexicalised and non-lexicalised MWEs found in WordNet using custom-designed rulebased and statistical approaches. We achieve F1-measure for the class of lexicalised word combinations close to 80%, easily beating two baselines (random and a majority class one). Language model also proves to be better than a feature-based logistic regression model.

Identifier
PID http://hdl.handle.net/11321/985
Metadata Access https://clarin-pl.eu/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:clarin-pl.eu:11321/985
Provenance
Creator Maziarz, Marek; Grabowski, Łukasz; Piotrowski, Tadeusz; Rudnicka, Ewa; Piasecki, Maciej
Publisher Global Wordnet Association
Publication Year 2023
Rights Creative Commons - Attribution 4.0 International (CC BY 4.0); https://creativecommons.org/licenses/by/4.0/; CC
OpenAccess true
Contact clarin-pl(at)pwr.edu.pl
Representation
Language English; Polish
Resource Type languageDescription
Format text/plain; charset=utf-8; application/pdf; downloadable_files_count: 1
Discipline Linguistics