The data package for the Ph.D. thesis titled Simplifying Imputation with Many Predictors in MICE Using Principal Component Analysis written by Edoardo Costantini.
MD files in this dataset can be opened using any text editor. The MACOSX folder in the dataset can be ignored as it was created automatically by OS X creating the zip file.
The general README.md file in this dataset describes the contents of the data package containing five folders:
'chapter-2contains the code and data files to reproduce the results for the second chapter of the thesis.
'chapter-3
contains the code and data files to reproduce the results for the third chapter of the thesis.
'hapter-4contains the code and data files to reproduce the results for the fourth chapter of the thesis.
'chapter-5
contains the code and data files to reproduce the results for the fifth chapter of the thesis.
thesis
contains the code and data files to reproduce the exact figures and tables in the thesis.
Metadata and data collection
For every chapter in the thesis, the record of who worked on which files is reported as a .csv
file registering the git log of every project. The .csv
files were obtained with the following Zsh commands:
zsh
echo changes, date, contributor, message > log-changes.csv
git log --name-status --pretty=format:%h,%ai,%aL,%s >> log-changes.csv
Two main datasets were used throughout the chapters of the thesis:
European Value Study questionnaire data, as available on the GESIS portal (https://doi.org/10.4232/1.13511)
Fireworks disaster data, as made available in the CRAN version of the R package mice (see here), originally collected by de Roos, C., Greenwald, R., den Hollander-Gijsman, M., Noorthoorn, E., van Buuren, S., & de Jongh, A. (2011). A randomised comparison of cognitive behavioural therapy (CBT) and eye movement desensitisation and reprocessing (EMDR) in disaster-exposed children. European Journal of Psychotraumatology, 2(1). https://doi.org/10.3402/ejpt.v2i0.5694
The rest of the data used in this thesis were simulated by computer scripts to evaluate the properties of different statistical approaches.
Raw database
The README file in every folder provides instructions on how to load the data used and which computer scripts to run to process the data. The versions of the data used for the analysis by the author are available in the respective chapter folders with one exception. The European Values Study data used in Chapters 2 and 4 could not be shared because of explicit terms of use limitations. However, the data are stored by GESIS and made available via password-protected online access, for which user registration is required. The DOI https://doi.org/10.4232/1.13511, also provided in the chapters using these data, points to the exact version of the data file.
Material
All material is available in the chapter folders. The README files contained in every chapter folder describe how to reproduce results.
Statistical processing
All computer scripts used to perform statistical processing are available in the chapter folders. The README files contained in every chapter folder describe how to reproduce results.
Processed database
All processed data used to generate the figures and tables in the thesis are available in the folder thesis/data/
.
Accepted or published manuscript or publication
DOIs for the three published chapters:
Chapter 2: https://doi.org/10.1177/00491241231200194
Chapter 3: https://doi.org/10.3758/s13428-023-02117-1
Chapter 4: https://doi.org/10.48550/arXiv.2309.01608
Disclaimer
If any issues arise in replicating results while following instructions provided in this data package please refer to the instructions described in the chapters of the thesis.
The README files of each chapter contain information on:
Name of chapter and summary
Contents of the directory including mentioning of the subfolders
How to replicate results
Running the simulations
Simulation study
Collinearity study
EVS resampling study
Obtaining the plots and tables