Replication Data for: External validation of machine learning hyperparameters for systematic review screening prioritization

DOI

This repository contains all materials for the External validation of machine learning hyperparameters for systematic review screening prioritization

This study evaluates model configurations for accelerating the screening phase of systematic reviews, focusing on the validation of hyperparameters in a newly optimized active learning framework. Forty-five curated datasets containing 202,177 scientific studies, of which 1.5\% were included in systematic reviews, were used to assess model performance and sensitivity thresholds. Simulations were conducted using three model configurations, applying time-to-event analysis to track recall progress over screening effort.

Results show that the lightweight \texttt{elas_u4} model provides great performance while requiring significantly less computational power than \texttt{elas_h3}. In most cases, screening either one-fourth of the total records or observing a consecutive streak of 10\% irrelevant records is already sufficient to identify 95\% of records. However, the relationship between screening fraction and irrelevant streaks suggests that combined metrics could further reduce screening effort without compromising recall.

ASReview, 2.0

Identifier
DOI https://doi.org/10.34894/YSXMVV
Metadata Access https://dataverse.nl/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34894/YSXMVV
Provenance
Creator Teijema, Jelle Jasper ORCID logo
Publisher DataverseNL
Contributor Teijema, Jelle Jasper
Publication Year 2025
Rights CC-BY-4.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/licenses/by/4.0
OpenAccess true
Contact Teijema, Jelle Jasper (uu.nl)
Representation
Resource Type Dataset
Format text/csv; application/vnd.openxmlformats-officedocument.spreadsheetml.sheet; application/zip
Size 2909; 11303; 3448506846
Version 2.1
Discipline Other