-
Reference List of Slovene Frequent Common Words
The reference list of Slovene most frequent common words was prepared by selecting vocabulary at the intersection of the most frequent 10,000 lemmas of four Slovene text... -
A Resource for Evaluating Graded Word Similarity in Context: CoSimLex
The dataset contains human similarity ratings for pairs of words. The annotators were presented with contexts that contained both of the words in the pair and the dataset... -
Dataset of Slovene idiomatic expressions SloIE
SloIE is a manually labelled dataset of Slovene idiomatic expressions. It contains 29,400 sentences with 75 different expressions that can occur with either a literal or an... -
Keyword extraction datasets for Croatian, Estonian, Latvian and Russian 1.0
EACL Hackashop Keyword Challenge Datasets In this repository you can find ids of articles used for the keyword extraction challenge at EACL Hackashop on News Media Content... -
24sata news comment dataset 1.0
The dataset of user comments provided for research purposes for the EMBEDDIA, a Horizon 2020 project, extracted from the database of user comments from the 24sata.hr news... -
Ekspress news article archive (in Estonian and Russian) 1.0
The dataset is an archive of articles from the Ekspress Meedia news site from 2009-2019, containing over 1.4M articles, mostly in Estonian language (1,115,120 articles) with... -
Latvian Delfi article archive (in Latvian and Russian) 1.0
This dataset is an archive of articles from the Delfi news site from 2015-2019, containing over 180,000 articles (c. 50% in Latvian and 50% in the Russian language). Keywords... -
Multilingual Culture-Independent Word Analogy Datasets
Word analogy task evaluates word embeddings, based on analagous word pairs (eg. "Paris - France" should be equivalent to "Rome - Italy", "son - daughter" should be equivalent to... -
EMBEDDIA tools output example corpus of Estonian, Croatian and Latvian news a...
This dataset contains articles from EMBEDDIA Media partners with various information added by the tools developed within the EMBEDDIA project: - 12,390 Estonian articles from...
