Dataset - B2FIND

Source code and data for the PhD Thesis "Metrics of Graph-Based Meaning Repre...

This dataset contains source code and data used in the PhD thesis "Metrics of Graph-Based Meaning Representations with Applications from Parsing Evaluation to Explainable NLG...

Source code and data for the PhD Thesis "Measuring the Contributions of Visio...

This dataset contains source code and data used in the PhD thesis "Measuring the Contributions of Vision and Text Modalities in Multimodal Transformers". The dataset is split...

Extensions to the Slovene translation of SuperGLUE

SuperGLUE is a benchmark styled after GLUE with a new set of more difficult language understanding tasks, improved resources, and a public leaderboard. It is comprised of 8...

Slovene translation of the SQuAD2.0 dataset

Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to...

Natural Language 2 Semantic Hypergraph Dataset NL2SH 1.0

NL2SH (Natural Language to Semantic Hypergraph) dataset can be used to build and evaluate methods for knowledge extraction and representation based on a semantic hypergraph....

Pretrained models for recognising sex education concepts SemSEX 1.0

Pretrained language models for detecting and classifying the presence of sex education concepts in Slovene curriculum documents. The models are PyTorch neural network models,...

Corpus for identifying sex education concepts SemSex 1.0

The SemSex corpus is designed to facilitate the automated recognition of sexual education concepts within curriculum description documents. The corpus contains two components:...

Terminological dictionary of artificial intelligence

The terminological dictionary was compiled within the framework of the project Development of Slovene in the Digital Environment. It is an example collection of 413 terms from...

KPWr chunks 2021

357 documents from KPWr corpus annotated manually at syntactic level (chunks). Please cite as: Oleksy, M., Walentynowicz, W., & Wieczorek, J. (2021). New approach to the...

F1000RD

F1000RD is the first openly licensed, multi-domain corpus of publications, their revisions and peer reviews from an open reviewing platform.

M2QA: A Multi-domain Multilingual Question Answering Benchmark Dataset

M2QA (Multi-domain Multilingual Question Answering) is an extractive question answering benchmark for evaluating joint language and domain transfer. M2QA includes 13,500 SQuAD...

Slovenian commonsense reasoning model SloMET-ATOMIC 2020

The SloMET-ATOMIC 2020 is a Slovene commonsense reasoning model that is able to predict commonsense descriptions in a natural language for a given input sentence. The model is...

52 datasets found