HRVQA dataset underlying the PhD thesis of Kun Li: Interactive Vision-Language Understanding: from Question Answering to Guided Segmentation

DOI

Visual question answering (VQA) is an important and challenging multimodal task in computer vision. We bring VQA task to high-resolution aerial images and propose a large-scale dataset HRVQA based on a semi-automatically construction scheme, which covers the inferences from commonly-seen task reasoning to specific attribute recognition.

Identifier
DOI https://doi.org/10.17026/PT/31SAWD
Metadata Access https://phys-techsciences.datastations.nl/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.17026/PT/31SAWD
Provenance
Creator K. Li ORCID logo
Publisher DANS Data Station Physical and Technical Sciences
Contributor Kun Li; George Vosselman; Michael Ying Yang
Publication Year 2025
Rights CC-BY-NC-4.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/licenses/by-nc/4.0
OpenAccess true
Contact Kun Li (Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente)
Representation
Resource Type Dataset
Format text/plain; application/zip; application/json
Size 1006; 8425798001; 5773743363; 8086057719; 8563849806; 8714872700; 8879414186; 8738366234; 8843444797; 8887121760; 8780515386; 8276100900; 77621314; 32401113; 105770001; 8102174; 26477242
Version 1.0
Discipline Earth and Environmental Science; Environmental Research; Geosciences; Natural Sciences