Source code and data for the PhD Thesis "On-Premise Medical Information Extraction from German Doctor’s Letters under Clinical Constraints"

DOI

Dataset overview

This dataset contains source code and annotation guidelines used in the PhD thesis:

“On-Premise Medical Information Extraction from German Doctor’s Letters under Clinical Constraints”

Repository structure

The dataset is split into five repositories:

Source code for Chapter 2.6 De-identification of German doctor’s letters Source code for Chapter 5 Clinical Section Classification using Pretrained Language Models and Prompting Source code for Chapter 6 Medication Information Extraction using Local Large Language Models Source code for Chapter 7Clinical Application: Medication Trends and Polypharmacy Annotation guidelines for Chapters 2.6, 4, 5, and 7

CARDIO:DE

The main dataset used for experiments in Chapters 5, 6, and 7:

CARDIO:DE - https://doi.org/10.11588/DATA/AFYQDY

Additional datasets (not included here)

Other datasets used include:

n2c2 2018 Track 2 (used in Chapter 6) -
https://doi.org/10.1093/jamia/ocz166

Notes on additional data and model availability

Doctor’s letters from the cardiology domain used in Chapters 2, 5, 6, and 7 (except for CARDIO:DE) and all further-pretrained and finetuned models cannot be distributed due to data protection regulations.

Identifier
DOI https://doi.org/10.11588/DATA/USQLMB
Metadata Access https://heidata.uni-heidelberg.de/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.11588/DATA/USQLMB
Provenance
Creator Richter-Pechanski, Phillip
Publisher heiDATA
Contributor Richter-Pechanski, Phillip; Frank, Anette
Publication Year 2026
Rights info:eu-repo/semantics/openAccess
OpenAccess true
Contact Richter-Pechanski, Phillip (Heidelberg University, Department of Computational Linguistics); Frank, Anette (Heidelberg University, Department of Computational Linguistics)
Representation
Resource Type Dataset
Format application/zip
Size 1558244; 9755; 17163; 54167; 10754
Version 1.0
Discipline Life Sciences; Medicine