NLP in Diagnostic Texts from Nephropathology [Research Data]

DOI

This data set contains all annotated topic word tables from the work "NLP in Diagnostic Texts from Nephropathology", as well as all pre-processed and tf-idf-vectorized text files. The raw texts (i.e., descriptive and diagnostic sections) are explicitly not made available, since it cannot be ruled out here that it is possible to infer the patient or the person making the report. This is in accordance with our local ethics committee.

Please note: This data set is not yet complete and will be completed soon.

Please refer to chapter 3.1.2 of our paper to learn how to interpret the annotated topic word tables.

The associated gitlab project http://gitlab.medma.uni-heidelberg.de/mlegnar/nlp-in-diagnostic-texts-from-nephropathology contains some examples of how the .pkl files can be opened and used with python.

Identifier
DOI https://doi.org/10.11588/data/KS5W0H
Metadata Access https://heidata.uni-heidelberg.de/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.11588/data/KS5W0H
Provenance
Creator Legnar, Maximilian; Daumke, Philipp; Hesser, Jürgen; Porubsky, Stefan; Popovic, Zoran; Bindzus, Jan Niklas; Siemoneit, Joern-Helge; Weis, Cleo-Aron
Publisher heiDATA
Contributor Legnar, Maximilian; Weis, Cleo-Aron; Institute of Pathology, Medical Faculty Mannheim, Heidelberg University; Legnar, Maximillian; Bindzus, Jan Niklas
Publication Year 2022
Rights CC BY 4.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/licenses/by/4.0
OpenAccess true
Contact Legnar, Maximilian (Institute of Pathology, Medical Faculty Mannheim, Heidelberg University, Germany); Weis, Cleo-Aron (Institute of Pathology, Medical Faculty Mannheim, Heidelberg University, Germany)
Representation
Resource Type Dataset
Format application/octet-stream; application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Size 658789; 875657; 316789; 475357; 26482; 26513; 25989; 25160; 24221; 27573; 27585
Version 1.3
Discipline Life Sciences; Medicine