Concreteness and imageability lexicon MEGA.HR-Crossling

PID

The lexicon contains concreteness and imageability predictions of words in 77 languages. The resource is built via supervised machine learning, using average human responses obtained for Croatian lexemes inside the MEGAHR project (http://megahr.ffzg.unizg.hr) as the response variable, and the Facebook cross-lingual word embeddings (https://github.com/Babylonpartners/fastText_multilingual) as explanatory variables. The Spearman correlation of human responses and automatic annotations on the Croatian-English language pair is ~0.8 for concreteness and ~0.7 for imageability.

Identifier
PID http://hdl.handle.net/11356/1187
Related Identifier https://arxiv.org/abs/1807.02903
Related Identifier https://github.com/clarinsi/megahr-crossling
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1187
Provenance
Creator Ljubešić, Nikola
Publisher Jožef Stefan Institute; Faculty of Humanities and Social Sciences, University of Zagreb
Publication Year 2018
Rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0); https://creativecommons.org/licenses/by-sa/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Afrikaans; Arabic; Azerbaijani; Belarusian; Bulgarian; Bengali; Bangla; Bosnian; Catalan; Valencian; Cebuano; Czech; Welsh; Danish; German; Greek, Modern (1453-); Greek; English; Esperanto; Spanish; Castilian; Estonian; Basque; Persian; Farsi; Finnish; French; Western Frisian; Galician; Gujarati; Hebrew; Hindi; Croatian; Hungarian; Armenian; Indonesian; Icelandic; Italian; Japanese; Georgian; Kazakh; Central Khmer; Khmer; Kannada; Korean; Kirghiz; Kyrgyz; Latin; Luxembourgish; Letzeburgesch; Lithuanian; Latvian; Malagasy; Macedonian; Malayalam; Mongolian; Marathi; Marāṭhī; Malay; Burmese; Nepali; Dutch; Flemish; Norwegian; Panjabi; Punjabi; Polish; Portuguese; Romanian; Moldavian; Moldovan; Russian; Sinhala; Sinhalese; Slovak; Slovenian; Slovene; Albanian; Serbian; Swedish; Tamil; Telugu; Tajik; Thai; Tagalog; Turkish; Ukrainian; Urdu; Uzbek; Vietnamese; Chinese
Resource Type lexicalConceptualResource
Format application/zip; text/plain; charset=utf-8; downloadable_files_count: 1
Discipline Linguistics