Dataset - B2FIND

A Resource for Evaluating Graded Word Similarity in Context: CoSimLex

The dataset contains human similarity ratings for pairs of words. The annotators were presented with contexts that contained both of the words in the pair and the dataset...

Slovenian RoBERTa contextual embeddings model: SloBERTa 1.0

The monolingual Slovene RoBERTa (A Robustly Optimized Bidirectional Encoder Representations from Transformers) model is a state-of-the-art model representing words/tokens as...

CroSloEngual BERT

Trilingual BERT (Bidirectional Encoder Representations from Transformers) model, trained on Croatian, Slovenian, and English data. State of the art tool representing...

ELMo embeddings model, Slovenian

ELMo language model (https://github.com/allenai/bilm-tf) used to produce contextual word embeddings, trained on entire Gigafida 2.0 corpus...

Slovenian RoBERTa contextual embeddings model: SloBERTa 2.0

The monolingual Slovene RoBERTa (A Robustly Optimized Bidirectional Encoder Representations from Transformers) model is a state-of-the-art model representing words/tokens as...

CroSloEngual BERT 1.1

Trilingual BERT (Bidirectional Encoder Representations from Transformers) model, trained on Croatian, Slovenian, and English data. State of the art tool representing...

ELMo embeddings models for seven languages

ELMo language model (https://github.com/allenai/bilm-tf) used to produce contextual word embeddings, trained on large monolingual corpora for 7 languages: Slovenian, Croatian,...

ELMo Embeddings for Polish

A model of ELMo embeddings for Polish language trained on large textual corpora (KGR10). To retrain the model please use the checkpoint and vocabulary files available at:...

LitLat BERT

Trilingual BERT-like (Bidirectional Encoder Representations from Transformers) model, trained on Lithuanian, Latvian, and English data. State of the art tool representing...

9 datasets found