Dataset - B2FIND

Boosting Unsupervised Semantic Segmentation with Principal Mask Proposals

Unsupervised semantic segmentation aims to automatically partition images into semantically meaningful regions by identifying global semantic categories within an image corpus...

Actor-critic Instance Segmentation

Most approaches to visual scene analysis have emphasised parallel processing of the image elements. However, one area in which the sequential nature of vision is apparent, is...

Turk Bootstrap Word Sense Inventory (TWSI) 2.0

Turk Bootstrap Word Sense Inventory (TWSI) 2.0. This lexical resource, created by a crowdsourcing process using Amazon Mechanical Turk (http://www.mturk.com), encompasses a...

PeerQA: A Scientific Question Answering Dataset from Peer Reviews

We present PeerQA, a real-world, scientific, document-level Question Answering (QA) dataset. PeerQA questions have been sourced from peer reviews, which contain questions that...

Lexical Substitution Dataset for German.

This article describes a lexical substitution dataset for German. The whole dataset contains 2,040 sentences from the German Wikipedia,with one target word in each sentence....

Self-supervised Augmentation Consistency for Adapting Semantic Segmentation

We propose an approach to domain adaptation for semantic segmentation that is both practical and highly accurate. In contrast to previous work, we abandon the use of...

Single-stage Semantic Segmentation from Image Labels

Recent years have seen a rapid growth in new approaches improving the accuracy of semantic segmentation in a weakly supervised setting, i.e. with only image-level labels...

RevUtil

Providing constructive feedback to paper authors is a core component of peer review. With reviewers increasingly having less time to perform reviews, automated support systems...

Dense Unsupervised Learning for Video Segmentation

We present a novel approach to unsupervised learning for video object segmentation (VOS). Unlike previous work, our formulation allows to learn dense feature representations...

ARR Data Collection Initiative 2025

Dataset of peer review reports, meta-reviews, reviewer-author discussions, and paper drafts collected from ACL Rolling Review within the context of the new data collection...

Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Cr...

Themis-CodeRewardBench is a code-specific reward model evaluation benchmark comprising ~8.9k diverse code preference pairs across eight programming languages and five quality...

Review Quality Estimation

This is the complete dataset which contains: (1) Data from OpenReview and ARR for 16 venues. (2) Preprocessed data (itemization + sentence split) for 1000 reviews sampled from...

MAGneT

MAGneT is a synthetic counseling session dataset generated using a novel Multi-Agent framework including: specialized response agents (reflection, questioning, solutions,...

Exposía: Academic Writing Assessment of Exposés and Peer Feedback

Exposía is a publicly available research dataset that captures the full, pedagogically grounded process of academic writing and feedback in higher education. The dataset...

PeerQA-XT

The rapid growth of scientific publications makes it increasingly difficult for researchers to keep up with new findings. Scientific question answering (QA) systems aim to...

Author-in-the-Loop Response Generation and Evaluation: Integrating Author Exp...

Re3Align, a new large-scale dataset for author-in-the-loop response generation, comprising 3.4k complete paper records (review, response, paper and revised paper) with 440k...

SciCoQA: Quality Assurance for Scientific Paper--Code Alignment

We present SciCoQA, a dataset for detecting discrepancies between scientific publications and their codebases to ensure faithful implementations. We construct SciCoQA from...

No Needles Attached? Inferring Energy Metabolism Zones and Lactate Accumulati...

These are the supplementary materials to the publication "No Needles Attached? Inferring Energy Metabolism Zones and Lactate Accumulation from Touchscreen Input". The repository...

M4FC dataset

M4FC dataset, accompanying the paper "M4FC: a Multimodal, Multilingual, Multicultural, Multitask Real-World Fact-Checking Dataset". The dataset contains annotations for 4,982...

eacl2026-assessing-paper-novelty

Dataset for evaluating automated novelty assessment in academic papers. Contains 182 ICLR submissions with human annotations, LLM-derived novelty assessments from reviewer...

52 datasets found