Replication Data for: Simplifying Imputation with Many Predictors in MICE Using Principal Component Analysis

DOI

The data package for the Ph.D. thesis titled Simplifying Imputation with Many Predictors in MICE Using Principal Component Analysis written by Edoardo Costantini.

MD files in this dataset can be opened using any text editor. The MACOSX folder in the dataset can be ignored as it was created automatically by OS X creating the zip file. The general README.md file in this dataset describes the contents of the data package containing five folders:

'chapter-2contains the code and data files to reproduce the results for the second chapter of the thesis. 'chapter-3 contains the code and data files to reproduce the results for the third chapter of the thesis. 'hapter-4contains the code and data files to reproduce the results for the fourth chapter of the thesis. 'chapter-5 contains the code and data files to reproduce the results for the fifth chapter of the thesis. thesis contains the code and data files to reproduce the exact figures and tables in the thesis.

Metadata and data collection

For every chapter in the thesis, the record of who worked on which files is reported as a .csv file registering the git log of every project. The .csv files were obtained with the following Zsh commands: zsh echo changes, date, contributor, message > log-changes.csv git log --name-status --pretty=format:%h,%ai,%aL,%s >> log-changes.csv

Two main datasets were used throughout the chapters of the thesis:

European Value Study questionnaire data, as available on the GESIS portal (https://doi.org/10.4232/1.13511) Fireworks disaster data, as made available in the CRAN version of the R package mice (see here), originally collected by de Roos, C., Greenwald, R., den Hollander-Gijsman, M., Noorthoorn, E., van Buuren, S., & de Jongh, A. (2011). A randomised comparison of cognitive behavioural therapy (CBT) and eye movement desensitisation and reprocessing (EMDR) in disaster-exposed children. European Journal of Psychotraumatology, 2(1). https://doi.org/10.3402/ejpt.v2i0.5694

The rest of the data used in this thesis were simulated by computer scripts to evaluate the properties of different statistical approaches.

Raw database

The README file in every folder provides instructions on how to load the data used and which computer scripts to run to process the data. The versions of the data used for the analysis by the author are available in the respective chapter folders with one exception. The European Values Study data used in Chapters 2 and 4 could not be shared because of explicit terms of use limitations. However, the data are stored by GESIS and made available via password-protected online access, for which user registration is required. The DOI https://doi.org/10.4232/1.13511, also provided in the chapters using these data, points to the exact version of the data file.

Material

All material is available in the chapter folders. The README files contained in every chapter folder describe how to reproduce results.

Statistical processing

All computer scripts used to perform statistical processing are available in the chapter folders. The README files contained in every chapter folder describe how to reproduce results.

Processed database

All processed data used to generate the figures and tables in the thesis are available in the folder thesis/data/.

Accepted or published manuscript or publication

DOIs for the three published chapters:

Chapter 2: https://doi.org/10.1177/00491241231200194 Chapter 3: https://doi.org/10.3758/s13428-023-02117-1 Chapter 4: https://doi.org/10.48550/arXiv.2309.01608

Disclaimer

If any issues arise in replicating results while following instructions provided in this data package please refer to the instructions described in the chapters of the thesis.

The README files of each chapter contain information on:

Name of chapter and summary Contents of the directory including mentioning of the subfolders How to replicate results Running the simulations Simulation study Collinearity study EVS resampling study Obtaining the plots and tables

Identifier
DOI https://doi.org/10.34894/5CZLAW
Related Identifier IsCitedBy https://doi.org/10.1177/00491241231200194
Related Identifier IsCitedBy https://doi.org/10.3758/s13428-023-02117-1
Related Identifier IsCitedBy https://doi.org/10.48550/arXiv.2309.01608
Metadata Access https://dataverse.nl/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34894/5CZLAW
Provenance
Creator Costantini, Edoardo ORCID logo
Publisher DataverseNL
Contributor Costantini, Edoardo; Tilburg University; DataverseNL
Publication Year 2024
Rights info:eu-repo/semantics/openAccess
OpenAccess true
Contact Costantini, Edoardo (Tilburg University)
Representation
Resource Type Miscellaneous data (R scripts, R packages, databases, documentation); Dataset
Format application/zip
Size 15738221
Version 1.0
Discipline Agriculture, Forestry, Horticulture, Aquaculture; Agriculture, Forestry, Horticulture, Aquaculture and Veterinary Medicine; Life Sciences; Social Sciences; Social and Behavioural Sciences; Soil Sciences