Source code and data for the PhD Thesis "On-Premise Medical Information Extraction from German Doctor’s Letters under Clinical Constraints"

Dataset

DOI

Dataset overview

This dataset contains source code and annotation guidelines used in the PhD thesis:

“On-Premise Medical Information Extraction from German Doctor’s Letters under Clinical Constraints”

Repository structure

The dataset is split into five repositories:

Source code for Chapter 2.6 De-identification of German doctor’s letters Source code for Chapter 5 Clinical Section Classification using Pretrained Language Models and Prompting Source code for Chapter 6 Medication Information Extraction using Local Large Language Models Source code for Chapter 7Clinical Application: Medication Trends and Polypharmacy Annotation guidelines for Chapters 2.6, 4, 5, and 7

CARDIO:DE

The main dataset used for experiments in Chapters 5, 6, and 7:

CARDIO:DE - https://doi.org/10.11588/DATA/AFYQDY

Additional datasets (not included here)

Other datasets used include:

n2c2 2018 Track 2 (used in Chapter 6) -
https://doi.org/10.1093/jamia/ocz166

Notes on additional data and model availability

Doctor’s letters from the cardiology domain used in Chapters 2, 5, 6, and 7 (except for CARDIO:DE) and all further-pretrained and finetuned models cannot be distributed due to data protection regulations.

Identifier
DOI	https://doi.org/10.11588/DATA/USQLMB
Metadata Access	https://heidata.uni-heidelberg.de/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.11588/DATA/USQLMB

Provenance
Creator	Richter-Pechanski, Phillip
Publisher	heiDATA
Contributor	Richter-Pechanski, Phillip; Frank, Anette
Publication Year	2026
Rights	info:eu-repo/semantics/openAccess
OpenAccess	true
Contact	Richter-Pechanski, Phillip (Heidelberg University, Department of Computational Linguistics); Frank, Anette (Heidelberg University, Department of Computational Linguistics)

Representation
Resource Type	Dataset
Format	application/zip
Size	1558244; 9755; 17163; 54167; 10754
Version	1.0
Discipline	Life Sciences; Medicine