-
The Orange workflow for observing collocation clusters ColEmbed 1.0
The Orange Workflow for Observing Collocation Clusters ColEmbed 1.0 ColEmbed is a workflow (.OWS file) for Orange Data Mining (an open-source machine learning and data... -
Slovenian RoBERTa contextual embeddings model: SloBERTa 2.0
The monolingual Slovene RoBERTa (A Robustly Optimized Bidirectional Encoder Representations from Transformers) model is a state-of-the-art model representing words/tokens as... -
ELMo embeddings models for seven languages
ELMo language model (https://github.com/allenai/bilm-tf) used to produce contextual word embeddings, trained on large monolingual corpora for 7 languages: Slovenian, Croatian,... -
Word embeddings CLARIN.SI-embed.hr 1.0
CLARIN.SI-embed.hr contains word embeddings induced from a large collection of Croatian texts composed of the Croatian web corpus hrWaC and a 400-million-token-heavy collection... -
Slovenian RoBERTa contextual embeddings model: SloBERTa 1.0
The monolingual Slovene RoBERTa (A Robustly Optimized Bidirectional Encoder Representations from Transformers) model is a state-of-the-art model representing words/tokens as... -
Word embeddings CLARIN.SI-embed.sl 1.0
CLARIN.SI-embed.sl contains word embeddings induced from a large collection of Slovene texts composed of existing corpora of Slovene, e.g GigaFida, Janes, KAS, slWaC etc. The... -
Word embeddings CLARIN.SI-embed.bg 1.0
CLARIN.SI-embed.bg contains word embeddings for Bulgarian induced from the MaCoCu-bg web crawl corpus (http://hdl.handle.net/11356/1515). The embeddings are based on the... -
PoLitBert_v32k_cos1_5_50k - Polish RoBERTa model
Polish RoBERTa model trained on Polish Wikipedia, Polish literature and Oscar. -
Lithuanian Word embeddings
GloVe type word vectors (embeddings) for Lithuanian. Delfi.lt corpus (~70 million words) and StanfordNLP were used for training. The training consisted of several stages: 1)... -
LitLat BERT
Trilingual BERT-like (Bidirectional Encoder Representations from Transformers) model, trained on Lithuanian, Latvian, and English data. State of the art tool representing... -
PoLitBert_v32k_cos1_2_50k - Polish RoBERTa model
Polish RoBERTa model trained on Polish Wikipedia, Polish literature and Oscar. -
EWBST tests for english
Submission contains test generated for EWBST test of English word embedding models. Tests were created with princeton wordnet and plWN english synsts. -
Finnish Semantic Relatedness Model
This model is a semantic model that captures the relatedness of Finnish words as word vectors. This model can be used in various tasks such as metaphor interpretation. For... -
Package of word embeddings of Czech from a large corpus
This package comprises eight models of Czech word embeddings trained by applying word2vec (Mikolov et al. 2013) to the currently most extensive corpus of Czech, namely SYN v9... -
Digital humanities: Introduction. A 10-week course with practical sessions.
The aim of the course is to introduce digital humanities and to describe various aspects of digital content processing. The course consists of 10 lessons with video material and... -
CoNLL 2017 Shared Task - Automatically Annotated Raw Texts and Word Embeddings
Automatic segmentation, tokenization and morphological and syntactic annotations of raw texts in 45 languages, generated by UDPipe (http://ufal.mff.cuni.cz/udpipe), together... -
Pretrained word and multi-sense embeddings for Estonian
Word and multi-sense embedding for Estonian trained on lemmatized etTenTen: Corpus of the Estonian Web. Word embeddings are trained with word2vec. Sense embeddings are trained...