The Cerrado biome in Brazil covers approximately 24% of the country. It is one of the richest and most diverse savannas in the world, with 23 vegetation types (physiognomies) consisting mostly of tropical savannas, grasslands, forests and dry forests. It is considered as one of the global hotspots of biodiversity because of the high level of endemism and rapid loss of its original habitat. This dataset includes maps of the vegetation in the Cerrado in two different hierarchical levels of physiognomies. These physiognomies were defined by Ribeiro and Walter (2008) and consist in a hierarchical classification structure. The first hierarchical level (referred as level-1) consists on three classes: grassland, savanna and forest; which are further split in a total of 12 sub classes in level-2. The maps were produced under the scope of the project "Development of systems to prevent forest fires and monitor vegetation cover in the Brazilian Cerrado” (WorldBank Project #P143185) – Forest Investment Program (FIP) - in collaboration with the Earth Observation Lab from the Humboldt University. The methodological approach was published at: doi:10.5194/isprs-archives-XLIII-B3-2020-953-2020, 2020. The goal was to analyze the potential of Landsat Analysis Ready Data (ARD) in combination with different environmental data to classify the vegetation in the Cerrado in two different hierarchical levels. The field data used for training and validation are included in this dataset. The classification accuracy was assessed using Monte Carlo simulation, in which 1000 simulations were carried out by randomly selecting 70% of the samples to train the random forest (RF) classification model, while the remaining 30% were used for validation. In each iteration, a confusion matrix was calculated, and the average confusion matrix was used to derive the overall accuracy and the class-wise f1-scores. On the first hierarchical level, with the three classes savanna, grasslands and forest, our model results reached f1-scores of 0.86, 0.87 and 0.85 leading to an overall accuracy of 0.86. In the second hierarchical level, we differentiated a total of 12 vegetation physiognomies with an overall accuracy of 0.77. The following class f1-scores for the vegetation classes in the second hierarchical level were: Campo limpo: 0.687, Campo rupestre: 0.528, Campo sujo: 0.851, Cerradao: 0.658, Cerrado rupestre: 0.847, Cerrado sensu stricto: 0.815, Ipuca: 0.830, Mata riparia: 0.743, Mata seca: 0.611, Palmeiral: 0.907, Parque de Cerrado: 0.966, Vereda: 0.364. The following data sets are provided here: (a) the classified maps in compressed TIFF format (one per hierarchical level) at 30-meters spatial resolution, (b) a QGIS style file for displaying the data in the QGIS software, (c) a csv file with the training data set (2,828 ground samples).
The software used to produce the maps is available as open source on https://github.com/davidfrantz/force.Note: The TIFF raster files use Geographic coordinate system with the WGS 84 datum.