Replication Data for: External validation of machine learning hyperparameters for systematic review screening prioritization

Dataset

DOI

This repository contains all materials for the External validation of machine learning hyperparameters for systematic review screening prioritization

This study evaluates model configurations for accelerating the screening phase of systematic reviews, focusing on the validation of hyperparameters in a newly optimized active learning framework. Forty-five curated datasets containing 202,177 scientific studies, of which 1.5\% were included in systematic reviews, were used to assess model performance and sensitivity thresholds. Simulations were conducted using three model configurations, applying time-to-event analysis to track recall progress over screening effort.

Results show that the lightweight \texttt{elas_u4} model provides great performance while requiring significantly less computational power than \texttt{elas_h3}. In most cases, screening either one-fourth of the total records or observing a consecutive streak of 10\% irrelevant records is already sufficient to identify 95\% of records. However, the relationship between screening fraction and irrelevant streaks suggests that combined metrics could further reduce screening effort without compromising recall.

ASReview, 2.0

Identifier
DOI	https://doi.org/10.34894/YSXMVV
Metadata Access	https://dataverse.nl/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34894/YSXMVV

Provenance
Creator	Teijema, Jelle Jasper
Publisher	DataverseNL
Contributor	Teijema, Jelle Jasper
Publication Year	2025
Rights	CC-BY-4.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/licenses/by/4.0
OpenAccess	true
Contact	Teijema, Jelle Jasper (uu.nl)

Representation
Resource Type	Dataset
Format	text/csv; application/vnd.openxmlformats-officedocument.spreadsheetml.sheet; application/zip
Size	2909; 11303; 3448506846
Version	2.1
Discipline	Other