This dataset contains source code and data used in the PhD thesis "Measuring the Contributions of Vision and Text Modalities in Multimodal Transformers". The dataset is split into five repositories:
Code and resources related to chapter 2 of the thesis (Section 2.2., method described in "Using Scene Graph Representations and Knowledge Bases")
Code and resources related to chapter 3 of the thesis (VALSE dataset).
Code and resources related to chapter 4 of the thesis: MM-SHAP measure and experiments code.
Code and resources related to chapter 5 of the thesis: CCSHAP measure and experiments code related to large language models (LLMs).
Code and resources related to the experiments with vision and language model decoders from chapters 3, 4, and 5.