Real-time identification of patients with metastatic breast cancer from electronic health records

DOI

Background: To develop a screening algorithm to identify patients with metastatic breast cancer (mBC) among all breast cancer cases from electronic health records (EHRs).

Methods: The text mining software tool CTcue (IQVIA Business) was used to scrutinize the EHRs. The screening algorithm included: ‘metastatic’, relevant treatment procedure, relevant radiology procedure, tumour marker CA15.3 measurement and ‘hospice’. If at least two terms were found, the patient was suspected to have mBC. The screening algorithm was developed and validated in two SONABRE registry (NCT-03577197) hospitals for the incidence years 2019 and 2020. Manual screening was considered the gold standard. The sensitivity, specificity, positive (PPV), and negative predictive values (NPV) and number needed to screen (NNS) were calculated. Three scenarios were evaluated: 1) Identification of prevalent and incident cases, 2) First identification of incidence cases and 3) Subsequent identification of incident cases (i.e., after excluding previously identified cases).

Results: Among the 6629 breast cancer patients who visited the hospitals, 1323 patients (20%) were identified by the screening algorithm as suspected mBC cases, 941 (14%) had mBC of which 753 (11%) were prevalent and 188 (3%) incident cases. The sensitivity, specificity, PPV, NPV and NNS of the screening algorithm were 1) 100%, 93%, 71%, 100%, and 1.4 for identifying prevalent and incident cases, 2) 99%, 81%, 13%, 100%, and 7.5 for the first identification of incident cases, and 3) 99%, 94%, 33%, 100% and 3.1 for the subsequent identification of incident cases. Overall, 2 cases were missed and 14 additional cases were identified with the screening algorithm.

Conclusions: The developed screening algorithm for the automatic and real-time identification of patients with mBC is considered valid and efficient. Manual screening could be reduced by at least 80%, identified an additional 14 mBC patients and missed only 2 mBC cases. This algorithm can be used to identify eligible patients for inclusion in clinical trials and observational studies, after careful translation, validation and annual update of the algorithm per institution.

Enclosed documents: Screening algorithms, abstract and poster.

CTcue - MUMC+ - SAP, 4.3.2 - 4.6.0

CTcue - Jeroen Bosch Hospital - HIX, 4.11.1 - 4.13.1

CTcue - MUMC+ - EPIC, 4.15.2 - 4.16.0

Identifier
DOI https://doi.org/10.34894/ATHQDF
Related Identifier References https://doi.org/10.1016/j.esmoop.2025.104902
Metadata Access https://dataverse.nl/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34894/ATHQDF
Provenance
Creator Geurts, Sandra ORCID logo; de Fallois, Aude; Vriens, Ingeborg ORCID logo; Gillissen, C; Dammers, J.T.; Tol, Jolien ORCID logo; Tjan-Heijnen, Vivianne ORCID logo
Publisher DataverseNL
Contributor Geurts, Sandra; Maastricht University Medical Centre
Publication Year 2026
Funding Reference MUMC+ Data2Care fellowship 2023-2025
Rights CC-BY-4.0; info:eu-repo/semantics/restrictedAccess; http://creativecommons.org/licenses/by/4.0
OpenAccess false
Contact Geurts, Sandra (Maastricht University Medical Centre)
Representation
Resource Type Poster; Dataset
Format application/pdf
Size 412777; 335823; 263056
Version 1.0
Discipline Life Sciences; Medicine
Spatial Coverage Maastricht; 's Hertogenbosch