Repository descriptionData and code for "Functional group classification using consensus clustering" (Ubilla Pavez, Paz & Maynard, In Revision, PLOS Computational Biology). This paper presents a consensus clustering method that classifies species into functional groups while accounting for trait uncertainty and trait correlation, using repeated resampling with Gaussian Mixture Models synthesized into a consensus matrix.This repository contains the input trait data, taxonomic metadata, species name matching tables, and spatial diversity metrics used in the case study of global tree functional group classification. The consensus clustering pipeline code is available at https://github.com/pabloubilla/tree_clustering/.File descriptionsEstimated_trait_table_with_monos.csvSpecies-level trait data for 47,828 tree species across 18 traits (e.g., wood density, leaf area, tree height). Contains both observed and imputed (predicted) values from Maynard et al. (2022, Nature Communications). Each row is a species–trait combination. Columns: accepted_bin (species binomial), fit (fitting method, e.g. "phy" for phylogenetic), LAT and LON (coordinates of observation, NA if imputed), TraitID (numeric trait identifier), trait (full trait name), trait_short (abbreviated trait name), quant (whether predicted using quantile random forest), pred_value (predicted/imputed trait value), obs_value (observed value, NA if unavailable).taxonomic_information.csvTaxonomic classification for each tree species. Columns: genus, family, order, group (Angiosperms or Gymnosperms), accepted_bin (species binomial, used as join key), mono_fern (whether the species is a monocot or fern).bgci_v1_3_matched_names.csvSpecies name matching table linking names from the Botanic Gardens Conservation International (BGCI) ThreatSearch database (v1.3) to the accepted binomial names used in this study. Columns: TaxonName (original name in BGCI), Author (taxonomic authority), accepted_bin (matched accepted binomial).global_tree_search_trees_1_7.csvSpecies list from GlobalTreeSearch (v1.7; Beech et al. 2017), a global database of tree species and country distributions maintained by BGCI. Columns: TaxonName (species name), Author (taxonomic authority).plot_diversity_metrics/grid_coordinates.csvGrid cell reference table mapping grid IDs to geographic coordinates. Columns: grid_id (unique identifier), Latitude, Longitude.plot_diversity_metrics/Functional_group_results.csvFunctional group diversity metrics calculated per grid cell using the 42 functional groups identified in this study, based on presence–absence data from Paz et al. (2024, Global Ecology and Biogeography). Columns: Latitude, Longitude, nclust (functional group richness, i.e. number of unique groups present), cluster_simpson (functional redundancy, i.e. Simpson's Index applied to functional groups).plot_diversity_metrics/Paz_et_al_data.csvTraditional diversity metrics per grid cell from Paz et al. (2024), used for comparison with the functional group metrics. Columns: Latitude, Longitude, nspec (species richness), raoq (Rao's quadratic entropy, i.e. mean pairwise trait distance), fdr (functional richness, i.e. convex hull volume in trait space).