Multilingual dataset of COVID tweets for relation-level metaphor analysis TCMeta 1.0

Dataset

PID

TCMeta is a dataset of noun phrase constructions from COVID-related tweets, annotated for relation-level metaphor.

It contains 2,138 Slovene and 2,221 English instances in tab-separated tabular format .tsv, where each line presents a unique phrase under consideration, extracted from a COVID-related tweet. The primary annotations include the COVID metaphor label (whether the phrase expresses a metaphor relating to COVID), but also additional ones for idioms, metaphors not relating to COVID, or metaphors not evident on the relation-level.

The complete user tweet could not be published due to the ToS of the then Twitter platform. We recommend retrieving the text of the tweets via their IDs using the Hydrator tool [https://github.com/docnow/hydrator] or similar.

The dataset is further described in: Brglez, M., Zayed, O. & Buitelaar, P. TCMeta: a multilingual dataset of COVID tweets for relation-level metaphor analysis. Lang Resources & Evaluation 59, 437–475 (2025). https://doi.org/10.1007/s10579-024-09725-z.

@article{brglez2025tcmeta, title={{TCMeta}: a multilingual dataset of {COVID} tweets for relation-level metaphor analysis}, author={Brglez, Mojca and Zayed, Omnia and Buitelaar, Paul}, journal={Language Resources and Evaluation}, pages={437--475}, volume={59}, year={2025}, publisher={Springer}, doi = {10.1007/s10579-024-09725-z} }

Identifier
PID	http://hdl.handle.net/11356/1787
Related Identifier	https://doi.org/10.1007/s10579-024-09725-z
Related Identifier	https://doi.org/10.5281/zenodo.16921580
Metadata Access	http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1787

Provenance
Creator	Brglez, Mojca; Zayed, Omnia; Buitelaar, Paul
Publisher	Faculty of Arts, University of Ljubljana
Publication Year	2023
Funding Reference	info:eu-repo/grantAgreement/EC/H2020/883285
Rights	Creative Commons - Attribution 4.0 International (CC BY 4.0); https://creativecommons.org/licenses/by/4.0/; PUB
OpenAccess	true
Contact	info(at)clarin.si

Representation
Language	English; Slovenian; Slovene
Resource Type	corpus
Format	text/plain; application/octet-stream; text/plain; charset=utf-8; downloadable_files_count: 2
Discipline	Linguistics