WSI-Babel-Shark: Empty Whole-Slide Images for Slide-Label Metadata Extraction

Dataset

DOI

This dataset contains 22 whole-slide image (WSI) files in SVS format, digitized using a Leica GT450 scanner. All WSIs were intentionally scanned without tissue; only the physical slide labels are present. The purpose of this dataset is to support the evaluation and benchmarking of the WSI-Babel-Shark metadata-extraction pipeline.

Empty slides allow reduced file sizes, preservation of SVS metadata, and controlled conditions for benchmarking label-processing components, including OCR, DataMatrix decoding, stain parsing, SlideID reconstruction, and metadata harmonization. All WSIs retain full TIFF tiling, SVS headers, and Leica metadata. Files were manually inspected to ensure complete de-identification, and all CaseIDs and SlideIDs represent synthetic test cases.

A ground-truth CSV file containing validated metadata fields is included for benchmarking. No patient-identifying information is contained in any file.

Identifier
DOI	https://doi.org/10.11588/DATA/ZBS9RS
Metadata Access	https://heidata.uni-heidelberg.de/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.11588/DATA/ZBS9RS

Provenance
Creator	Aliyari, Shahram
Publisher	heiDATA
Contributor	Aliyari, Shahram; Weis, Cleo-Aron
Publication Year	2025
Rights	CC0 1.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/publicdomain/zero/1.0
OpenAccess	true
Contact	Aliyari, Shahram (Heidelberg University, Institute of Pathology, Section of Computational Pathology)

Representation
Resource Type	Whole-Slide Images (SVS), Ground Truth Metadata CSV; Dataset
Format	text/tab-separated-values; text/plain; application/zip
Size	1893; 2419; 2697660195
Version	1.0
Discipline	Life Sciences; Medicine
Spatial Coverage	Heidelberg University Hospital, Institute of Pathology