Pre-processing data for the Mega Meta Project

It is key to understand the etiology and risks for the onset, relapse, and chronicity of common mental disorders to identify people at risk and improve preventive and acute treatment interventions. However, there is a lack of overview of the evidence for factors that predict or are related to common mental disorders. Due to a big data problem, it is impossible to synthesize all evidence using traditional systematic reviews.

The mega meta project, funded by Centre for Urban Mental Health and a cooperation between Amsterdam UMC, University of Utrecht, and University of Amsterdam, is a large systematic review that aimed to synthesize (meta-analyze) all prospective evidence for factors, mechanisms of change and interaction of factors related to the onset, maintenance, and relapse/recurrence of three common mental disorders: Anxiety, substance use, and depressive disorders. The systematic searches, selection, and data checks were conducted using ASReview between June 2021 and July 2022.

This DANS dataset is the result of https://github.com/asreview/paper-megameta-postprocessing-screeningresults

The MegaMeta Output files

This repository contains the output files and the final data of the so-called, Mega-Meta study on reviewing factors contributing to substance use, anxiety, and depressive disorders. The scripts used to generate the output can be found here: https://doi.org/10.5281/zenodo.5803268 (https://github.com/asreview/paper-megameta-postprocessing-screeningresults)

Output files

The Steps & Files Each step consists of input and output files. On this data repository only the output files are stored. The output of one step serves as the input for the next step. The final result of the project can be found in the Final Data folder.

To replicate the study up until the final results, you can follow these steps:

1. Search:

The input and protocol for the search strategy can be found on this Open Science Framework
(OSF) repository: https://doi.org/10.17605/OSF.IO/M5UHY.

1_Search_Output
This folder contains RIS formatted .txt files for each of the 
three subjects: substance use, anxiety, and depressive disorders.

2. Preprocessing:

The outout from the previous step, the search, is the input for the preprocessing step.
More information about the preprocessing scripts and protocol can also be found within 
the OSF repository mentioned above: https://doi.org/10.17605/OSF.IO/M5UHY

In short, the preprocessing consists of:
- Updating the references in EndNote
- Deduplicating the references in EndNote
- Deduplicating the references based on DOI in R
- Labeling inclusions and exclusions, which are to be used as prior knowledge.

2_Preprocessing_Output
This folder contains three .csv files, or in other words three datasets. 
These datasets have been partly labeled, meaning that some of the records have been 
labeled as either relevant or irrelevant. These labeled records are also known 
as the prior knowledge, which is necessary for the next step.

3. Screening phase 1:

The input for the first screening phase are the partly labeled datasets from step 2.
The Screening protocol which is used in Screening phase 1 can be found here:
https://doi.org/10.17605/OSF.IO/3ZNAR.

3_ScreeningPhase1_Output
This folder contains six files, both an .xlsx and an .asreview file for each of the subjects.
The .xlsx file is a human readable dataset, containing the screening decisions made by
the screeners from screening phase 1. The .asreview file, is a project file which can be 
uploaded to ASReview LAB to see all the decisions that have been made within the software
itself. It also contains all the information on the trained model and settings up until that
point.

4. Screening phase 2:

The input for the second screening phase are the .xlsx files from the first screening phase. 
However, in the second screening phase, a different machine learning model, a 17-layer 
Convolutional Neural Network, was used to optimize the screening progress. 
Find out more about the different model and how the hyperparameters were trained
in the GitHub repository:
https://github.com/asreview/paper-megameta-hyperparameter-training

4_ScreeningPhase2_Output
Similar to the previous step, six files are present in the 4_ScreeningPhase2_Output folder:
A .xlsx and a .asreview file per subject (anxiety, depression and substance abuse). 
These files contain both the screening decisions from the first and from the second 
screening phase.

5.Postprocessing:

The input for the postprocessing steps are the .xlsx files from the second screening phase. 
Read more about the postprocessing steps within this GitHub repository:
https://github.com/asreview/paper-megameta-postprocessing-screeningresults

5_Postprocessing_Output
Throughout the postprocessing, there are several files outputted which serve again as input
for the next part in the postprocessing pipeline.
- Merging the three .xlsx files results in: megameta_asreview_merged.xlsx
- Retrieving missing dois results in: megameta_asreview_doi_retrieved.xlsx
- Deduplication based on doi and a conservative deduplication strategy results in:
  megameta_asreview_deduplicated.xlsx
- The quality of the labels is checked in two stages. The first stage checks for falsely excluded records 
  and the second for falsely included records. Together they result in 
  megameta_asreview_quality_checked.xlsx

//

If you want access to the database, please contact Dr. Brouwer: m.e.brouwer@amsterdamumc.nl

Identifier
DOI https://doi.org/10.17026/dans-29d-n6yg
PID https://nbn-resolving.org/urn:nbn:nl:ui:13-gh-kucu
Related Identifier https://doi.org/10.5281/zenodo.5752357
Related Identifier https://doi.org/10.17605/OSF.IO/M5UHY
Related Identifier https://doi.org/10.17605/OSF.IO/3ZNAR
Related Identifier https://doi.org/10.5281/zenodo.5768305
Related Identifier https://doi.org/10.5281/zenodo.5747049
Related Identifier https://doi.org/10.17026/dans-z7w-9446
Related Identifier https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42021266297
Metadata Access https://easy.dans.knaw.nl/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=oai:easy.dans.knaw.nl:easy-dataset:254745
Provenance
Creator Brouwer, M.E. ORCID logo
Publisher Data Archiving and Networked Services (DANS)
Contributor Hofstee, L.; Brand, S. van den; Teijema, J.; Melnikov, V.; Ferdinands, G.; Kramer, B.; Boer, J. de; Weijdema, F.; Lucassen, P.; Sloot, P.; Stronks, K.; Weert, J. van; Wiers, R.; Bockting, C.; Schoot, R. van de; Bruin, J. de; Prof. C. Bockting (Amsterdam UMC, University of Amsterdam, Centre for Urban Mental Health)
Publication Year 2024
Rights info:eu-repo/semantics/restrictedAccess; License: http://dans.knaw.nl/en/about/organisation-and-policy/legal-information/DANSLicence.pdf; http://dans.knaw.nl/en/about/organisation-and-policy/legal-information/DANSLicence.pdf
OpenAccess false
Representation
Language English
Resource Type Dataset
Format text/plain; csv
Discipline Life Sciences; Medicine