Dataset - B2FIND

UKP Convincing Arguments v1

Corpus content UKPConvArg1-full-XML This is the full corpus as referred in the article (Table 2, UKPConvArgAll). It contains 32 xml files, each file corresponding to one...

Cognate pairs for several languages

Cognates for the following language pairs can be used for research purposes: en-es, en-de, en-ru, en-el, en-fa, de-cz. Includes: * The training and test data for the en-es...

Fine-tuned model weights for Stance Detection Benchmark System

This collection includes model weights (BERT-based), fine-tuned in a multi-task setting on 10 heterogeneous stance detection datasets. For more information, please refer to the...

Forum Post Quality Dataset

The dataset has been compiled from Nabble.com. It has been used and is described in the papers listed below. The data can be obtained on request.

Football Coreference Corpus

This script generates: the original sentence-level Football Coreference Corpus (FCC), a version of the sentence-level FCC which was cleaned and updated after manual review,...

Personality Profiling of Fictional Characters using Sense-Level Links between...

This dataset contains the personality gold standard of 298 book characters annotated for their MBTI traits, gathered manually from the http://mbti-databank.com/ website and...

Visual Feature Track Dataset

This dataset contains 282 visual feature tracks. A visual feature track is a sequence of feature observations of the same real 3D-landmark in consecutive image frames. These...

WWW 2019 X-Ling Question Retrieval Data v1

This repository contains the data and code to reproduce the results of our paper "Improved Cross-Lingual Question Retrieval for Community Question Answering"...

Whittle Networks datasets

Datasets for paper "Whittle Networks: A Deep Likelihood Model for Time Series" Paper at http://proceedings.mlr.press/v139/yu21c.html Code at...

Verb Sense Labelling

Vocabulary used for the creation of sense patterns:

Fast Axiomatic Attribution for Neural Networks

Mitigating the dependence on spurious correlations present in the training dataset is a quickly emerging and important topic of deep learning. Recent approaches include priors...

Self-supervised Augmentation Consistency for Adapting Semantic Segmentation

We propose an approach to domain adaptation for semantic segmentation that is both practical and highly accurate. In contrast to previous work, we abandon the use of...

Dense Unsupervised Learning for Video Segmentation

We present a novel approach to unsupervised learning for video object segmentation (VOS). Unlike previous work, our formulation allows to learn dense feature representations...

Single-stage Semantic Segmentation from Image Labels

Recent years have seen a rapid growth in new approaches improving the accuracy of semantic segmentation in a weakly supervised setting, i.e. with only image-level labels...

On emergence

Output files for the paper Are Emergent Abilities in Large Language Models just In-Context Learning.

The TYC Dataset for Understanding Instance-Level Semantics and Motions of Cel...

TYC dataset proposed in the paper "The TYC Dataset for Understanding Instance-Level Semantics and Motions of Cells in Microstructures" [ICCVW 2023]. Project page:...

Analyzing Dataset Annotation Quality Management in the Wild

This is the accompanying data for the paper "Analyzing Dataset Annotation Quality Management in the Wild". Data quality is crucial for training accurate, unbiased, and...

Lessons Learned from a Citizen Science Project for Natural Language Processing

This is the accompanying data for our paper "Lessons Learned from a Citizen Science Project for Natural Language Processing". Many Natural Language Processing (NLP) systems use...

Annotation Error Detection: Analyzing the Past and Present for a More Coheren...

This is the accompanying data for our paper "Annotation Error Detection: Analyzing the Past and Present for a More Coherent Future". Annotated data is an essential ingredient in...

DRZ Living Lab Tracked Robot SLAM Dataset

Data set for the evaluation of SLAM systems in challenging terrains. The data set covers four sequences with challenging terrain, each tracked with a high-performance Qualisys...

129 datasets found