<b>Supplemental Files for the paper "The distinct roles of genome, methylation, transcription, and translation on protein expression in </b><b>Arabidopsis thaliana resolve the Central Dogma’s information flow"</b><b> </b><b><i>Arabidopsis thaliana</i></b>

DOI

The archive contains Genome annotation data from the paper “The distinct roles of genome, methylation, transcription, and translation on protein expression in Arabidopsis thaliana resolve the Central Dogma’s information flow” by Zhong et al Genome Biology 2025The PDF file contains the submitted version of the paper (7/08/2025).The R markdown file 'col-can-figures.Rmd' contains the code used to generate most of the figures in the paper (Except Figs1 and 11)There are six files in the compressed tar archive Supplemental_files.tar.gz, namely:Supplemental File S1 Col_new_annotation_clean1_uniprot_function.gff3 This contains the genome annotations in GFF3 format for the Arabidopsis Col-0 genome assemblySupplemental File S2 Can_new_annotation_clean1_uniprot_function.gff3 This contains the genome annotations in GFF3 format for the Arabidopsis Can-0 genome assemblySupplemental File S4 Col_new_annotation_clean1.pep.fa This contains the peptide sequences of all annotated genes in the Col-0 genome assemblySupplemental File S3 Col_new_annotation_clean1.mRNA.fa This contains the mRNA sequences of all annotated genes in the Col-0 genome assemblySupplemental File S5 Can_new_annotation_clean1.mRNA.fa This contains the mRNA sequences of all annotated genes in the Can-0 genome assemblySupplemental File S6 Can_new_annotation_clean1.pep.fa This contains the peptide sequences of all annotated genes in the Can-0 genome assemblyAbstract from the paper:Background We investigate the flow of genetic information from DNA to RNA to protein as described by the central dogma in molecular biology, to determine the impact of intermediate genomic levels on plant protein expression.Results We perform genomic profiling of rosette leaves in two Arabidopsis accessions, Col-0 and Can-0, and assemble their genomes using long reads and chromatin interaction data. We measure gene and protein expression in biological replicates grown in a controlled environment, also measuring CpG methylation, ribosome-associated transcript levels and tRNA abundance. Each omic level is highly reproducible between biological replicates and between accessions despite their ~1% sequence divergence; the single best predictor of any level in one accession is the corresponding level in the other. Within each accession, gene codon frequencies accurately model both mRNA and protein expression. The effects of a codon on mRNA and protein expression are highly correlated but independent of genome-wide codon frequencies or tRNA levels which instead match genome-wide amino acid frequencies. Ribosome-associated transcripts closely track mRNA levels.Conclusions DNA codon frequencies and mRNA expression levels are the main predictors of protein abundance. In the absence of environmental perturbation neither gene-body methylation, tRNA abundance nor ribosome-associated transcript levels add appreciable information. The impact of constitutive gene body methylation is mostly explained by gene codon composition. tRNA abundance tracks overall amino acid demand. However, genetic differences between accessions associate with differential gene-body methylation by inflating differential expression variation. Our data show that the dogma holds only if both sequence and abundance information in mRNA are considered.

Identifier
DOI https://doi.org/10.5522/04/28163222.v3
Related Identifier HasPart https://ndownloader.figshare.com/files/57005783
Related Identifier HasPart https://ndownloader.figshare.com/files/57006602
Related Identifier HasPart https://ndownloader.figshare.com/files/57666856
Metadata Access https://api.figshare.com/v2/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=oai:figshare.com:article/28163222
Provenance
Creator Mott, Richard; Zhong, Ziming ORCID logo; Bailey, Mark ORCID logo; Kim, Yong-In ORCID logo; Pesaran-Afsharyan, Nazanin (ORCID: 0000-0003-0298-988x); Parker, Briony; Arathoon, Louise; Li, Xiaowei ORCID logo; Rundle, Chelsea ORCID logo; Behrens, Andrew ORCID logo; D. Nedialkova, Danny; Slavov, Gancho; Hassani-Pak, Keywan; Lilley, Kathryn S. ORCID logo; Theodoulou, Frederica L. ORCID logo
Publisher University College London UCL
Contributor Figshare
Publication Year 2025
Rights https://creativecommons.org/publicdomain/zero/1.0/
OpenAccess true
Contact researchdatarepository(at)ucl.ac.uk
Representation
Language English
Resource Type Dataset
Discipline Other