Multilingual dataset of COVID tweets for relation-level metaphor analysis TCMeta 1.0

PID

TCMeta is a dataset of noun phrase constructions from COVID-related tweets, annotated for relation-level metaphor.

It contains 2,138 Slovene and 2,221 English instances in tab-separated tabular format .tsv, where each line presents a unique phrase under consideration, extracted from a COVID-related tweet. The primary annotations include the COVID metaphor label (whether the phrase expresses a metaphor relating to COVID), but also additional ones for idioms, metaphors not relating to COVID, or metaphors not evident on the relation-level.

The complete user tweet could not be published due to the ToS of the then Twitter platform. We recommend retrieving the text of the tweets via their IDs using the Hydrator tool [https://github.com/docnow/hydrator] or similar.

The dataset is further described in: Brglez, M., Zayed, O. & Buitelaar, P. TCMeta: a multilingual dataset of COVID tweets for relation-level metaphor analysis. Lang Resources & Evaluation 59, 437–475 (2025). https://doi.org/10.1007/s10579-024-09725-z.

@article{brglez2025tcmeta, title={{TCMeta}: a multilingual dataset of {COVID} tweets for relation-level metaphor analysis}, author={Brglez, Mojca and Zayed, Omnia and Buitelaar, Paul}, journal={Language Resources and Evaluation}, pages={437--475}, volume={59}, year={2025}, publisher={Springer}, doi = {10.1007/s10579-024-09725-z} }

Identifier
PID http://hdl.handle.net/11356/1787
Related Identifier https://doi.org/10.1007/s10579-024-09725-z
Related Identifier https://doi.org/10.5281/zenodo.16921580
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1787
Provenance
Creator Brglez, Mojca; Zayed, Omnia; Buitelaar, Paul
Publisher Faculty of Arts, University of Ljubljana
Publication Year 2023
Funding Reference info:eu-repo/grantAgreement/EC/H2020/883285
Rights Creative Commons - Attribution 4.0 International (CC BY 4.0); https://creativecommons.org/licenses/by/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language English; Slovenian; Slovene
Resource Type corpus
Format text/plain; application/octet-stream; text/plain; charset=utf-8; downloadable_files_count: 2
Discipline Linguistics