Subset of the dataset provided by the International Society for Photogrammetry and Remote Sensing (ISPRS), working group II/4, in the framework of a ''2D semantic labeling contest'' - benchmark 1. The original dataset [1] is composed of 33 orthorectified image tiles acquired by a near infrared (NIR) - green (G) - red (R) aerial camera, over the town of Vaihingen (Germany). The average size of the tiles is 20494 x 20064 pixels with a spatial resolution of 9 cm. Images are accompanied by a digital surface model (DSM) representing absolute height of pixels. 16 out of the 33 tiles are fully annotated at pixel level and are upload here in HDF5 format.
E.g. Vaihingen_xx.hdf5
x_1 = near infrared, red, green, nDSM, NDVI
y_1 = Groundtruth
m_1 = Boundaries
nDSM: normalized DSM , it represents the pixels height relative to the elevation of the nearest ground surface [2].
NDVI: Normalized Difference Vegetation Index
Groundtruth: annotated pixels
Boundaries: binary mask
The semantic segmentation task involves the discrimination of 6 land-cover / landuse classification classes: ''impervious surfaces'' (IS) (roads, concrete surfaces), ''buildings'' (BU), ''low vegetation'' (LV), ''trees'' (TR), ''cars'' (CA) and a class of ''clutter'' (CL) representing uncategorizable land covers. Classes are highly imbalanced: classes ''buildings'' and ''impervious surfaces'' cover 50% of the data, while ''car'' and ''clutter''only for 2% of the total labels. The 16 tiles are fully annotated at pixel level (i.e., Groundtruth ). Since there is uncertainty associated with the boundary of objects in the Groundtruth, these boundaries can be ignored for the evaluation (i.e., pixels with value 1 in the mask Boundaries)
[1] http://www2.isprs.org/commissions/comm3/wg4/semantic-labeling.html
[2] Gerke, M. (Author). (2014). Use of the stair vision library within the ISPRS 2D semantic labeling benchmark (Vaihingen). Web publication/site, ResearcheGate. DOI: 10.13140/2.1.5015.9683