Anonymize or Synthesize? – Privacy-Preserving Methods for Heart Failure Score Analytics [data]

DOI

In the publication [1] we implemented anonymization and synthetization techniques for a structured data set, which was collected during the HiGHmed Use Case Cardiology study [2]. We employed the data anonymization tool ARX [3] and the data synthetization framework ASyH [4] individually and in combination. We evaluated the utility and shortcomings of the different approaches by statistical analyses and privacy risk assessments. Data utility was assessed by computing two heart failure risk scores (Barcelona BioHF [5] and MAGGIC [6]) on the protected data sets. We observed only minimal deviations to scores from the original data set. Additionally, we performed a re-identification risk analysis and found only minor residual risks for common types of privacy threats.

We could demonstrate that anonymization and synthetization methods protect privacy while retaining data utility for heart failure risk assessment. Both approaches and a combination thereof introduce only minimal deviations from the original data set over all features. While data synthesis techniques produce any number of new records, data anonymization techniques offer more formal privacy guarantees. Consequently, data synthesis on anonymized data further enhances privacy protection with little impacting data utility. We hereby share all generated data sets with the scientific community through a use and access agreement.

[1] Johann TI, Otte K, Prasser F, Dieterich C: Anonymize or synthesize? Privacy-preserving methods for heart failure score analytics. Eur Heart J 2024;. doi://10.1093/ehjdh/ztae083

[2] Sommer KK, Amr A, Bavendiek, Beierle F, Brunecker P, Dathe H et al. Structured, harmonized, and interoperable integration of clinical routine data to compute heart failure risk scores. Life (Basel) 2022;12:749.

[3] Prasser F, Eicher J, Spengler H, Bild R, Kuhn KA. Flexible data anonymization using ARX—current status and challenges ahead. Softw Pract Exper 2020;50:1277–1304.

[4] Johann TI, Wilhelmi H. ASyH—anonymous synthesizer for health data, GitHub, 2023. Available at: https://github.com/dieterich-lab/ASyH.

[5] Lupón J, de Antonio M, Vila J, Peñafiel J, Galán A, Zamora E, et al. Development of a novel heart failure risk tool: the Barcelona bio-heart failure risk calculator (BCN Bio-HF calculator). PLoS One 2014;9:e85466.

[6] Pocock SJ, Ariti CA, McMurray JJV, Maggioni A, Køber L, Squire IB, et al. Predicting survival in heart failure: a risk score based on 39 372 patients from 30 studies. Eur Heart J 2013;34:1404–1413.

Identifier
DOI https://doi.org/10.11588/data/MXM0Q2
Related Identifier IsCitedBy https://doi.org/10.1093/ehjdh/ztae083
Metadata Access https://heidata.uni-heidelberg.de/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.11588/data/MXM0Q2
Provenance
Creator Johann, Tim Ingo ORCID logo; Otte, Karen ORCID logo; Prasser, Fabian ORCID logo; Dieterich, Christoph ORCID logo
Publisher heiDATA
Contributor Dieterich, Christoph; Johann, Tim; heiDATA: Heidelberg Research Data Repository
Publication Year 2024
Rights info:eu-repo/semantics/restrictedAccess
OpenAccess false
Contact Dieterich, Christoph (University Hospital Heidelberg, Klaus-Tschira-Institute for Computational Cardiology, Bioinformatics, Internal Medicine III); Johann, Tim (University Hospital Heidelberg, Klaus-Tschira-Institute for Computational Cardiology, Bioinformatics, Internal Medicine III)
Representation
Resource Type Dataset
Format application/pdf; text/tab-separated-values; text/plain
Size 640128; 190296; 197975; 286102; 106632; 107100; 191831; 3421
Version 1.0
Discipline Life Sciences; Medicine