Soil Metagenome Binning

DOI

Background Advances in high-fidelity long-read (HiFi-LR) sequencing technologies offer unprecedented opportunities to uncover the genomic diversity of complex microbial environments, such as soil. While short-read (SR) sequencing has historically enabled broad insights at gene-level diversity, its limited read length constrains the reconstruction of complete genomes. Conversely, HiFi-LR sequencing enhances the quality and completeness of metagenome-assembled genomes (MAGs), enabling higher-resolution taxonomic and functional annotation. However, the high cost and relatively low throughput of HiFi-LR sequencing can limit genome recovery, particularly at the binning stage, where coverage depth is critical. Results Here, we present a novel hybrid strategy that differs from classical hybrid assemblies, where SR and LR reads are jointly used at the assembly step. Instead, we use high-depth SR data to inform the binning of HiFi-LR contigs. Using both SR and HiFi-LR metagenomic datasets generated from a tunnel-cultivated soil sample, we demonstrate that SR-derived coverage profiles significantly improve the binning of HiFi-LR assemblies. This results in a substantial increase in the number and quality of recovered MAGs compared to using HiFi-LR data alone. Conclusion Our findings highlight that, even in the context of HiFi reads, combining SR and LR remains beneficial in highly diverse environments, such as soil, not for hybrid assembly per se, but to enhance the downstream binning process. This cost-effective hybrid binning approach provides a practical framework for maximising genome recovery in complex microbiomes.

Identifier
DOI https://doi.org/10.57745/NRED6I
Metadata Access https://entrepot.recherche.data.gouv.fr/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.57745/NRED6I
Provenance
Creator Belliardo, Carole; Frioux Clémence ORCID logo
Publisher Recherche Data Gouv
Contributor Belliardo, Carole; Maurice Nicolas; Pere Arthur; Danchin Etienne; Institut Sophia Agrobiotech
Publication Year 2025
Funding Reference Agence nationale de la recherche ANR-22-PEAE-0011 ; agence nationale de la recherche ANR-15-IDEX-01
Rights etalab 2.0; info:eu-repo/semantics/openAccess; https://spdx.org/licenses/etalab-2.0.html
OpenAccess true
Contact Belliardo, Carole (INRAE)
Representation
Resource Type Dataset
Format text/x-python; application/octet-stream; application/x-ipynb+json; text/x-python-script; text/comma-separated-values; video/quicktime; text/plain; text/tab-separated-values; text/html; text/tsv
Size 13250; 5717; 20500982415; 1424675088; 807; 9022986582; 25492494; 3078041467; 10137430; 11618858091; 32621735; 805; 692585; 3126; 3400; 3402; 23111394; 273375; 280878; 5488; 4673; 127277485; 7809; 367; 6841; 1059321; 707210; 1800728; 286185; 681177; 262410893; 929864; 909546; 899052; 1000820; 8810; 15528; 396; 10029; 17338; 397; 1394; 817; 704; 6236696; 971275301; 159546869; 495; 69; 2522089; 7968490; 23765; 24085; 2378627; 1071815; 4667427; 573; 207795939; 6718075; 601454; 878391; 138079731; 2112414
Version 1.0
Discipline Agriculture, Forestry, Horticulture; Computer Science; Geosciences; Agricultural Sciences; Agriculture, Forestry, Horticulture, Aquaculture; Agriculture, Forestry, Horticulture, Aquaculture and Veterinary Medicine; Earth and Environmental Science; Environmental Research; Life Sciences; Natural Sciences
Spatial Coverage France