TCMeta is a dataset of noun phrase constructions from COVID-related tweets, annotated for relation-level metaphor.
It contains 2,138 Slovene and 2,221 English instances in tab-separated tabular format .tsv, where each line presents a unique phrase under consideration, extracted from a COVID-related tweet. The primary annotations include the COVID metaphor label (whether the phrase expresses a metaphor relating to COVID), but also additional ones for idioms, metaphors not relating to COVID, or metaphors not evident on the relation-level.
The complete user tweet could not be published due to the ToS of the then Twitter platform. We recommend retrieving the text of the tweets via their IDs using the Hydrator tool [https://github.com/docnow/hydrator] or similar.
The dataset is further described in:
Brglez, M., Zayed, O. & Buitelaar, P. TCMeta: a multilingual dataset of COVID tweets for relation-level metaphor analysis. Lang Resources & Evaluation 59, 437–475 (2025). https://doi.org/10.1007/s10579-024-09725-z.
@article{brglez2025tcmeta,
title={{TCMeta}: a multilingual dataset of {COVID} tweets for relation-level metaphor analysis},
author={Brglez, Mojca and Zayed, Omnia and Buitelaar, Paul},
journal={Language Resources and Evaluation},
pages={437--475},
volume={59},
year={2025},
publisher={Springer},
doi = {10.1007/s10579-024-09725-z}
}