Dataset - B2FIND

ELMo embeddings model, Slovenian

ELMo language model (https://github.com/allenai/bilm-tf) used to produce contextual word embeddings, trained on entire Gigafida 2.0 corpus...
Slovenian RoBERTa contextual embeddings model: SloBERTa 2.0

The monolingual Slovene RoBERTa (A Robustly Optimized Bidirectional Encoder Representations from Transformers) model is a state-of-the-art model representing words/tokens as...
24sata news article archive 1.0

The 24sata news portal consists of a portal with daily news and several smaller portals covering news from specific topics, such as automotive news, health, culinary content,...
Slovenian keyword extraction dataset from SentiNews 1.0

The dataset consists of 7514 Slovenian news articles from the SentiNews 1.0 corpus by Bučar et al. 2017 (http://hdl.handle.net/11356/1110) which had available article keywords....

You can also access this registry using the API (see API Docs).

4 datasets found