We investigated the effect of stimulus-question ordering and modality in which the question is presented of a user study with visual question answering (VQA) tasks. In an eye-tracking user study (N=13), we tested 5 conditions within-subjects. The conditions were counter-balanced to account for order effects. We collected participants' answers to the VQA tasks, responses to the NASA TLX questionnaire after each completed condition, and gaze data was recorded only during exposure to the image stimulus.
We provide the data and scripts used for statistical analysis, the files used for the exploratory analysis in WebVETA, the image stimuli used per condition and training as well as the VQA tasks related to the images. The images and questions used in the user study is a subset of the GQA dataset (Hudson and Manning, 2019). For more information see: https://cs.stanford.edu/people/dorarad/gqa/index.html
The mean fixation duration, hit-any-AOI rate and scan paths were generated using gazealytics (https://www2.visus.uni-stuttgart.de/gazealytics/). The Hit-Any-AOI-rate and mean fixation duration was calculated per person per image stimulus.
Rstudio, 2022.12.0+353
Gazealytics, V1.0
Users interested in reproducing the results can follow the methodology as reported in the paper and use the image and question stimuli from the Files section. The analysis code as reported in the R scripts and the data are located the Files section.
The zip files inside webveta_files.zip should be uploaded to gazealytics (https://www2.visus.uni-stuttgart.de/gazealytics/) as zip-files.