Unraveling the Impact of Genome Assembly on Bacterial Typing: A One Health Perspective

In the context of global bacterial pathogen surveillance, it is crucial to ensure interoperability and harmonized data. Several systems are currently implemented for such surveillance, being designed to compare bacteria and identify outbreak clusters based on core genome MultiLocus Sequence Typing (cgMLST) profiles. Among the different approaches available to generate bacterial cgMLST profiles, our research used an assembly-based approach - as implemented in the chewBBACA tool - according to European Food Safety Authority (EFSA) guidelines. Simulations of short-read sequencing were conducted for 27 bacterial pathogen species of interest in animal, plant, and human health to evaluate the repeatability and reproducibility of cgMLST profiles. Various quality parameters, such as read quality and depth of sequencing were applied, and several read simulations and genome assemblies were repeated using three commonly used tools: SPAdes, Unicycler and Shovill. The results highlighted bioinformatic variability in cgMLST profiles, which appears unrelated to the assembly tools, but rather induced by the intrinsic composition of the genomes themselves. This variability observed in simulated sequencing was further validated with real data for five of the bacterial pathogens studied.

Identifier
Source https://data.blue-cloud.org/search-details?step=~0122C297CDDA8F8D090CEE472D1AC318CCEBB0D838E
Metadata Access https://data.blue-cloud.org/api/collections/2C297CDDA8F8D090CEE472D1AC318CCEBB0D838E
Provenance
Instrument NextSeq 1000; ILLUMINA
Publisher Blue-Cloud Data Discovery & Access service; ELIXIR-ENA
Publication Year 2024
OpenAccess true
Contact blue-cloud-support(at)maris.nl
Representation
Discipline Marine Science
Spatial Coverage (-97.000W, 36.000S, 138.000E, 48.867N)
Temporal Coverage Begin 1997-01-01T00:00:00Z
Temporal Coverage End 2021-01-01T00:00:00Z