Supplementary Datasets for Amino acid and codon usage explain amino acid misincorporation rates across the tree of life

Dataset

DOI

Protein translation is an error-prone process resulting in a random population of altered protein sequences in every cell. Here, we analyzed thousands of publicly available mass spectrometry datasets to detect amino acid misincorporations and quantify error rates in 14 model organisms. We find that overall error rates and the patterns of codon to amino acid error rates correlate across species. We estimate that on average 0.5-3% of protein molecules in a cell harbor a misincorporation, whereas this proportion can reach 10% for very long proteins. Highly expressed and very long proteins have lower error rates, indicating evolutionary selection on codon usage to reduce the cost of translation errors. While both codon-anticodon mispairing and tRNA mischarging contribute to misincorporations, we estimate that ~70% of misincorporation events are due to mispairing. The more frequent an amino acid in the proteome, the more likely it is misincorporated (r = 0.53), likely because frequent amino acids are abundant in the cell, increasing the rate of mischarging, and have abundant tRNAs, leading to increased mispairing. Overall, we find that amino acid and codon usage explain error rates. The conserved patterns of amino acid misincorporations from bacteria to humans suggest universal mechanisms driving translational fidelity.

Identifier
DOI	https://doi.org/10.17617/3.WCUCRH
Metadata Access	https://edmond.mpg.de/api/datasets/export?exporter=dataverse_json&persistentId=doi:10.17617/3.WCUCRH

Provenance
Creator	Toth-Petroczy, Agnes
Publisher	Edmond
Publication Year	2026
Funding Reference	Max Planck Gesellschaft
OpenAccess	true
Contact	tothpet(at)mpi-cbg.de

Representation
Language	English
Resource Type	Dataset
Version	1
Discipline	Other