Ekspress user comment dataset 1.0

PID

This dataset is an archive of reader comments on the Ekspress Meedia news site from 2009-2019, containing approximately 31M comments, mostly in the Estonian language, with some in Russian.

Description of the Datasets.

There are 11 CSV files: comments_2009.csv contains 2 898 438 comments from the year 2009 comments_2010.csv contains 2 377 591 comments from the year 2010 comments_2011.csv contains 2 729 389 comments from the year 2011 comments_2012.csv contains 3 372 776 comments from the year 2012 comments_2013.csv contains 3 289 393 comments from the year 2013 comments_2014.csv contains 3 195 502 comments from the year 2014 comments_2015.csv contains 3 202 592 comments from the year 2015 comments_2016.csv contains 2 848 624 comments from the year 2016 comments_2017.csv contains 2 838 075 comments from the year 2017 comments_2018.csv contains 3 194 597 comments from the year 2018 comments_2019.csv contains 1 526 755 comments from the year 2019 May In sum: 3 1473 732 comments

Columns: comment_id (string) - the ID of the written comment article_id (string) - the ID of the article for which the comment was written created_time (string) - the time and date of the comment subject (string) - the title of the comment reply_to_comment_id (string) - the parent comments ID content (string) - the comment itself is_anonymous (string) - 1 if the comment was published anonymously 0 if the comment was published by a registered user is_enabled (string) - 1 if the comment was published (online) 0 if it wasn’t published Questionable field: not all have been manually moderated No additional information from the moderators channel_language (string) - the language of the channel: 'nat' for Estonian, 'rus' for Russian create_user_id (string) - the user ID of the commentator '0' for all blocked comments. moderated_by (string) - the ID of the moderator

Identifier
PID http://hdl.handle.net/11356/1401
Related Identifier https://doi.org/10.21248/jlcl.34.2020.224
Related Identifier https://www.aclweb.org/anthology/2021.hackashop-1.14.pdf
Related Identifier http://embeddia.eu/
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1401
Provenance
Creator Shekhar, Ravi; Pollak, Senja; Pelicon, Andraž; Matthew, Purver; Krustok, Ivar
Publisher Ekspress Meedia Group
Publication Year 2021
Funding Reference info:eu-repo/grantAgreement/EC/H2020/825153
Rights Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0); PUB; https://creativecommons.org/licenses/by-nc-nd/4.0/
OpenAccess true
Contact info(at)clarin.si
Representation
Language Estonian; Russian
Resource Type corpus
Format text/plain; charset=utf-8; application/octet-stream; text/csv; downloadable_files_count: 12
Discipline Linguistics