-
Latvian user comment dataset 1.0
The dataset is an archive of reader comments from the Delfi news site from 2014-2019, containing approximately 12M comments, mostly in the Latvian language, with some in... -
Keyword extraction datasets for Croatian, Estonian, Latvian and Russian 1.0
EACL Hackashop Keyword Challenge Datasets In this repository you can find ids of articles used for the keyword extraction challenge at EACL Hackashop on News Media Content... -
Ekspress news article archive (in Estonian and Russian) 1.0
The dataset is an archive of articles from the Ekspress Meedia news site from 2009-2019, containing over 1.4M articles, mostly in Estonian language (1,115,120 articles) with... -
Latvian Delfi article archive (in Latvian and Russian) 1.0
This dataset is an archive of articles from the Delfi news site from 2015-2019, containing over 180,000 articles (c. 50% in Latvian and 50% in the Russian language). Keywords... -
Multilingual Culture-Independent Word Analogy Datasets
Word analogy task evaluates word embeddings, based on analagous word pairs (eg. "Paris - France" should be equivalent to "Rome - Italy", "son - daughter" should be equivalent to... -
Ekspress user comment dataset 1.0
This dataset is an archive of reader comments on the Ekspress Meedia news site from 2009-2019, containing approximately 31M comments, mostly in the Estonian language, with some...
