-
Slovene translation of the SQuAD2.0 dataset
Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to... -
Natural Language 2 Semantic Hypergraph Dataset NL2SH 1.0
NL2SH (Natural Language to Semantic Hypergraph) dataset can be used to build and evaluate methods for knowledge extraction and representation based on a semantic hypergraph.... -
Pretrained models for recognising sex education concepts SemSEX 1.0
Pretrained language models for detecting and classifying the presence of sex education concepts in Slovene curriculum documents. The models are PyTorch neural network models,... -
Corpus for identifying sex education concepts SemSex 1.0
The SemSex corpus is designed to facilitate the automated recognition of sexual education concepts within curriculum description documents. The corpus contains two components:... -
Terminological dictionary of artificial intelligence
The terminological dictionary was compiled within the framework of the project Development of Slovene in the Digital Environment. It is an example collection of 413 terms from... -
KPWr chunks 2021
357 documents from KPWr corpus annotated manually at syntactic level (chunks). Please cite as: Oleksy, M., Walentynowicz, W., & Wieczorek, J. (2021). New approach to the... -
Training and development dataset for information extraction in plant epidemio...
The “Training and development dataset for information extraction in plant epidemiomonitoring” is the annotation set of the “Corpus for the epidemiomonitoring of plant”. The... -
F1000RD
F1000RD is the first openly licensed, multi-domain corpus of publications, their revisions and peer reviews from an open reviewing platform. -
M2QA: A Multi-domain Multilingual Question Answering Benchmark Dataset
M2QA (Multi-domain Multilingual Question Answering) is an extractive question answering benchmark for evaluating joint language and domain transfer. M2QA includes 13,500 SQuAD... -
Slovenian commonsense reasoning model SloMET-ATOMIC 2020
The SloMET-ATOMIC 2020 is a Slovene commonsense reasoning model that is able to predict commonsense descriptions in a natural language for a given input sentence. The model is...