HRVQA dataset underlying the PhD thesis of Kun Li: Interactive Vision-Language Understanding: from Question Answering to Guided Segmentation

Dataset

DOI

Visual question answering (VQA) is an important and challenging multimodal task in computer vision. We bring VQA task to high-resolution aerial images and propose a large-scale dataset HRVQA based on a semi-automatically construction scheme, which covers the inferences from commonly-seen task reasoning to specific attribute recognition.

Identifier
DOI	https://doi.org/10.17026/PT/31SAWD
Metadata Access	https://phys-techsciences.datastations.nl/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.17026/PT/31SAWD

Provenance
Creator	K. Li
Publisher	DANS Data Station Physical and Technical Sciences
Contributor	Kun Li; George Vosselman; Michael Ying Yang
Publication Year	2025
Rights	CC-BY-NC-4.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/licenses/by-nc/4.0
OpenAccess	true
Contact	Kun Li (Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente)

Representation
Resource Type	Dataset
Format	text/plain; application/zip; application/json
Size	1006; 8425798001; 5773743363; 8086057719; 8563849806; 8714872700; 8879414186; 8738366234; 8843444797; 8887121760; 8780515386; 8276100900; 77621314; 32401113; 105770001; 8102174; 26477242
Version	1.0
Discipline	Earth and Environmental Science; Environmental Research; Geosciences; Natural Sciences