UNN-LC High-Resolution Histopathological Lung Tissue Patch Dataset

DOI

The UNN-LC High-Resolution Histopathological Lung Tissue Patch Dataset is a collection of image patches designed for computational prognostic evaluation of lung cancer. Compiled from a subset of 194 whole-slide images (WSIs) from the University Hospital of North Norway, this dataset provides a comprehensive representation of various lung tissue conditions. Each 768 x 768 pixel patch contributes to a detailed analysis of tissue morphology. The dataset was annotated by an oncologist (Thomas Kilvær) and a pathologist (Stig Dalen) with a concerted effort to minimize selection and labeling biases. Specifically, patches with predominantly cancer cells, including tumor-infiltrating lymphocytes, were annotated by Stig Dalen. Thomas Kilvær provided annotations for patches representing normal lung tissue. The combined efforts of Stig Dalen and Thomas Kilvær resulted in the annotations for the reactive stroma with tertiary lymphoid structures and necrosis areas data. Annotations were acquired using QuPath software and a custom-developed annotation tool. The dataset categorizes patches into four classes: necrosis, tumor, stroma_tls, and normal_lung. The necrosis class includes patches of tissue associated with tumor regions, while the normal lung class represents areas of healthy lung tissue, inclusive of stromal components. The stroma_tls class is characterized by patches of reactive stroma with dense tissue and lymphocyte aggregates. The tumor tissue class comprises patches with a predominant presence of tumor content and may also include areas with tumor-infiltrating lymphocytes (TILs). For those interested in further expanding the scope and improving the balance of classes within the dataset, additional patches from the LC25000 dataset can be integrated for a more diverse representation of tissue conditions. This approach can enhance the robustness of computational models developed using this data. The dataset is divided into training and testing sets to facilitate and promote reproducibility in the development and validation of vision models. The training set includes a selection of patches from each class, while the testing set is composed of the remaining patches to ensure a comprehensive assessment of model performance.

QuPath, 0.1.3

flet-patch-labeler, ad05dfe

Identifier
DOI https://doi.org/10.18710/ZZASBA
Related Identifier IsCitedBy https://doi.org/10.48550/arXiv.2405.02913
Metadata Access https://dataverse.no/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.18710/ZZASBA
Provenance
Creator Shvetsov, Nikita ORCID logo; Kilvær, Thomas Karsten ORCID logo; Dalen, Stig Manfred
Publisher DataverseNO
Contributor Shvetsov, Nikita; UiT The Arctic University of Norway; University Hospital of North Norway
Publication Year 2024
Funding Reference Research Council of Norway 309439 SFI VI ; North Norwegian Health Authority HNF1521-20
Rights CC0 1.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/publicdomain/zero/1.0
OpenAccess true
Contact Shvetsov, Nikita (UiT The Arctic University of Norway)
Representation
Resource Type image data; Dataset
Format text/plain; text/comma-separated-values; application/zip
Size 8546; 115182; 8253806613; 188833; 9068962846
Version 1.0
Discipline Life Sciences; Medicine