PARSEME corpora annotated for verbal multiword expressions (version 1.3)

PID

This multilingual resource contains corpora in which verbal MWEs have been manually annotated. VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). This is the first release of the corpora without an associated shared task. Previous version (1.2) was associated with the PARSEME Shared Task on semi-supervised Identification of Verbal MWEs (2020). The data covers 26 languages corresponding to the combination of the corpora for all previous three editions (1.0, 1.1 and 1.2) of the corpora. VMWEs were annotated according to the universal guidelines. The corpora are provided in the cupt format, inspired by the CONLL-U format. Morphological and syntactic information, ­­­­including parts of speech, lemmas, morphological features and/or syntactic dependencies, are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). All corpora are split into training, development and test data, following the splitting strategy adopted for the PARSEME Shared Task 1.2. The annotation guidelines are available online: https://parsemefr.lis-lab.fr/parseme-st-guidelines/1.3 The .cupt format is detailed here: https://multiword.sourceforge.net/cupt-format/

Identifier
PID http://hdl.handle.net/11372/LRT-5124
Related Identifier https://aclanthology.org/2023.mwe-1.6/
Related Identifier http://hdl.handle.net/11234/1-3367
Related Identifier https://gitlab.com/parseme/corpora/-/wikis/home
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11372/LRT-5124
Provenance
Creator Savary, Agata; Ramisch, Carlos; Guillaume, Bruno; Hawwari, Abdelati; Walsh, Abigail; Fotopoulou, Aggeliki; Bielinskienė, Agnė; Estarrona, Ainara; Gatt, Albert; Butler, Alexandra; Rademaker, Alexandre; Maldonado, Alfredo; Villavicencio, Aline; Farrugia, Alison; Muscat, Amanda; Gatt, Anabelle; Antić, Anđela; De Santis, Anna; Raffone, Annalisa; Riccio, Anna; Pascucci, Antonio; Gurrutxaga, Antton; Bhatia, Archna; Vaidya, Ashwini; Miral, Ayşenur; QasemiZadeh, Behrang; Priego Sanchez, Belem; Griciūtė, Bernadeta; Erden, Berna; Parra Escartín, Carla; Herrero, Carlos; Carlino, Carola; Pasquer, Caroline; Liebeskind, Chaya; Wang, Chenweng; Ben Khelil, Chérifa; Bonial, Claire; Somers, Clarissa; Aceta, Cristina; Krstev, Cvetana; Bejček, Eduard; Lindqvist, Ellinor; Erenmalm, Elsa; Palka-Binkiewicz, Emilia; Rimkute, Erika; Petterson, Eva; Cap, Fabienne; Hu, Fangyuan; Sangati, Federico; Wick Pedro, Gabriela; Speranza, Giulia; Jagfeld, Glorianna; Blagus, Goranka; Berk, Gözde; Attard, Greta; Eryiğit, Gülşen; Finnveden, Gustav; Martínez Alonso, Héctor; de Medeiros Caseli, Helena; Elyovich, Hevi; Xu, Hongzhi; Xiao, Huangyang; Miranda, Isaac; Jaknić, Isidora; El Maarouf, Ismail; Aduriz, Itziar; Gonzalez, Itziar; Matas, Ivana; Stoyanova, Ivelina; Jazbec, Ivo-Pavao; Busuttil, Jael; Waszczuk, Jakub; Findlay, Jamie; Bonnici, Janice; Šnajder, Jan; Antoine, Jean-Yves; Foster, Jennifer; Chen, Jia; Nivre, Joakim; Monti, Johanna; McCrae, John; Kovalevskaitė, Jolanta; Jain, Kanishka; Simkó, Katalin; Yu, Ke; Azzopardi, Kirsty; Adalı, Kübra; Uria, Larraitz; Zilio, Leonardo; Boizou, Loïc; van der Plas, Lonneke; Galea, Luke; Sarlak, Mahtab; Buljan, Maja; Cherchi, Manuela; Tanti, Marc; Di Buono, Maria Pia; Todorova, Maria; Candito, Marie; Constant, Matthieu; Shamsfard, Mehrnoush; Jiang, Menghan; Boz, Mert; Spagnol, Michael; Onofrei, Mihaela; Li, Minli; Elbadrashiny, Mohamed; Diab, Mona; Rizea, Monica-Mihaela; Hadj Mohamed, Najet; Theoxari, Natasa; Schneider, Nathan; Tabone, Nicole; Ljubešić, Nikola; Vale, Oto; Cook, Paul; Yan, Peiyi; Gantar, Polona; Ehren, Rafael; Fabri, Ray; Ibrahim, Rehab; Ramisch, Renata; Walles, Rinat; Wilkens, Rodrigo; Urizar, Ruben; Sun, Ruilong; Malka, Ruth; Galea, Sara Anne; Stymne, Sara; Louizou, Sevasti; Hu, Sha; Taslimipoor, Shiva; Ratori, Shraddha; Srivastava, Shubham; Cordeiro, Silvio Ricardo; Krek, Simon; Liu, Siyuan; Zeng, Si; Yu, Songping; Arhar Holdt, Špela; Markantonatou, Stella; Papadelli, Stella; Leseva, Svetlozara; Kuzman, Taja; Kavčič, Teja; Lynn, Teresa; Lichte, Timm; Pickard, Thomas; Dimitrova, Tsvetana; Yih, Tsy; Güngör, Tunga; Dinç, Tutkum; Iñurrieta, Uxoa; Tajalli, Vahide; Stefanova, Valentina; Caruso, Valeria; Puri, Vandana; Foufi, Vassiliki; Barbu Mititelu, Verginica; Vincze, Veronika; Kovács, Viktória; Shukla, Vishakha; Giouli, Voula; Ge, Xiaomin; Ha-Cohen Kerner, Yaakov; Öztürk, Yağmur; Yarandi, Yalda; Parmentier, Yannick; Zhang, Yongchen; Zhao, Yun; Urešová, Zdeňka; Yirmibeşoğlu, Zeynep; Qin, Zhenzhen; Stank; Cristescu, Mihaela; Zgreabăn, Bianca-Mădălina; Bărbulescu, Elena-Andreea; Stanković, Ranka
Publisher PARSEME
Publication Year 2023
Rights PARSEME Corpora v. 1.3 - Licence Agreement; https://lindat.mff.cuni.cz/repository/xmlui/page/licence-mwe-1.3; PUB
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Representation
Language Arabic; Bulgarian; Czech; German; Greek, Modern (1453-); Greek; English; Spanish; Castilian; Basque; Persian; Farsi; French; Irish; Hebrew; Hindi; Croatian; Hungarian; Lithuanian; Italian; Maltese; Polish; Portuguese; Romanian; Moldavian; Moldovan; Slovenian; Slovene; Serbian; Swedish; Turkish; Chinese
Resource Type corpus
Format text/plain; charset=utf-8; application/octet-stream; application/x-gzip; downloadable_files_count: 27
Discipline Linguistics