Genome structure analysis was conducted using MCScanX (Wang et al., 2012), with default settings. First, the whole proteome of M. enterolobii, predicted by EuGene, was self-blasted with an E-value cutoff of 1e-25, a maximum of 5 aligned sequences, and maximum 1 high-scoring pair (hsp). Subsequently, we used gene location information extracted from the GFF3 annotation file of EuGene, along with homology information based on the all-versus-all BLASTP analysis, to identify and categorize each duplicated protein-coding gene into one of five groups using the duplicate_gene_classifier program implemented in the MCScanX package.
The use of MCScanX reveals that a majority of gene duplicates create whole duplicated blocks, rather than dispersed independent duplications. Following the classification established by the duplicate_gene_classifier program implemented in the MCScanX package, 39,532 of the protein-coding genes (around 86.10%) are predicted to be duplicated at least once. Furthermore, a majority of these coding genes (75.6%) show a duplication depth of two (meaning for these genes, two other copies exist), further reinforcing the idea that the genome is triploid.