Dataset - B2FIND

close

Word embeddings CLARIN.SI-embed.bg 1.0

CLARIN.SI-embed.bg contains word embeddings for Bulgarian induced from the MaCoCu-bg web crawl corpus (http://hdl.handle.net/11356/1515). The embeddings are based on the...
Word embeddings CLARIN.SI-embed.sl 2.0

CLARIN.SI-embed.sl contains word embeddings induced from a large collection of Slovene texts composed of existing corpora of Slovene, e.g GigaFida, Janes, KAS, slWaC, MaCoCu-sl,...
ELMo embeddings models for seven languages

ELMo language model (https://github.com/allenai/bilm-tf) used to produce contextual word embeddings, trained on large monolingual corpora for 7 languages: Slovenian, Croatian,...
Word embeddings CLARIN.SI-embed.sr 2.0

CLARIN.SI-embed.sr contains word embeddings induced from the srWaC and MaCoCu-sr web corpora. The embeddings are based on the skip-gram model of fastText trained on...
Word embeddings for Polish (KGR10, Fasttext binary) kgr10_fasttext_bin_v1

Distributional language model (binary) for Polish trained on KGR10 using Fasttext (vector dimension: 100).
TimeAssign

TimeAssign is a program which recognizes temporal expressions and assigns TimeML labels to words in Polish text using a Bi-LSTM based neural net and wordform embeddings.
PoLitBert_v32k_linear_125k - Polish RoBERTa model

Polish RoBERTa model trained on Polish Wikipedia, Polish literature and Oscar.
PoLitBert_v32k_tri_125k - Polish RoBERTa model

Polish RoBERTa model trained on Polish Wikipedia, Polish literature and Oscar.
PoLitBert_v32k_linear_50k - Polish RoBERTa model

Polish RoBERTa model trained on Polish Wikipedia, Polish literature and Oscar.
EWBST tests for english

Submission contains test generated for EWBST test of English word embedding models. Tests were created with princeton wordnet and plWN english synsts.
PoLitBert_v32k_tri_50k - Polish RoBERTa model

Polish RoBERTa model trained on Polish Wikipedia, Polish literature and Oscar.
PoLitBert_v32k_cos1_5_50k - Polish RoBERTa model

Polish RoBERTa model trained on Polish Wikipedia, Polish literature and Oscar.
PoLitBert_v50k_linear_50k - Polish RoBERTa model

Polish RoBERTa model trained on Polish Wikipedia, Polish literature and Oscar.
KGR10 FastText Polish word embeddings

Distributional language model (both textual and binary) for Polish (word embeddings) trained on KGR10 corpus (over 4 billion of words) using Fasttext with the following variants...
PoLitBert_v32k_cos1_2_50k - Polish RoBERTa model

Polish RoBERTa model trained on Polish Wikipedia, Polish literature and Oscar.
LitLat BERT

Trilingual BERT-like (Bidirectional Encoder Representations from Transformers) model, trained on Lithuanian, Latvian, and English data. State of the art tool representing...
Lithuanian Word embeddings

GloVe type word vectors (embeddings) for Lithuanian. Delfi.lt corpus (~70 million words) and StanfordNLP were used for training. The training consisted of several stages: 1)...
Finnish Semantic Relatedness Model

This model is a semantic model that captures the relatedness of Finnish words as word vectors. This model can be used in various tasks such as metaphor interpretation. For...

«
1
2

You can also access this registry using the API (see API Docs).