The dataset comprises the pretraining and testing data for our work: Terrain-Informed Self-Supervised Learning: Enhancing Building Footprint Extraction from LiDAR Data with Limited Annotations. The pretaining data consists of images corresponding to the Digital Surface Models (DSM) and Digital Terrain Models (DTM) obtained from Norway, with a ground resolution of 1 meter, utilizing the UTM 33N projection. The primary data source for this dataset is the Norwegian Mapping Authority (Kartverket), which has made the data freely available on their website under the CC BY 4.0 license (Source: https://hoydedata.no/, License terms: https://creativecommons.org/licenses/by/4.0/)
The DSM and DTM models are generated from 3D LiDAR point clouds collected through periodic aerial campaigns. During these campaigns, the LiDAR sensors capture data with a maximum offset of 20 degrees from the nadir. Additionally, a subset of data also includes building footprints/labels created using the OpenStreetMap (OSM) database. Specifically, building footprints extracted from the OSM database were rasterized to match the grid of the DTM and DSM models. These rasterized labels are made available under the Open Database License (ODbL) in compliance with the OSM license requirements. We hope this dataset facilitates various applications in geographic analysis, remote sensing, and machine learning research.