MicroReset: characterization of the rabbit (Oryctolagus cuniculus) fecal metagenome and resistome by deep shotgun sequencing

DOI

Dataset Overview

This dataset provides:

A non-redundant, high-quality gene catalog comprising 5.7 million genes 2,323 Metagenome-Assembled Genomes (MAGs) 1,053 Metagenomic Species Pan-genomes (MSPs)

It is designed to support the analysis of shotgun metagenomic data from the rabbit gut microbiota.

Materials and Methods

Data Source

The dataset was generated from 30 rabbit fecal samples subjected to deep shotgun metagenomic sequencing. The sequencing data is available under the BioProject PRJEB50625.

Metagenomic Assembly

Raw sequencing reads were first pre-processed using fastp for adapter removal and quality trimming. Host-derived reads were filtered out by mapping to the rabbit reference genome (GCF_000001635.27) using Bowtie2 and removing mapped reads with Samtools.

Each sample was individually assembled using metaSPAdes. Contigs shorter than 1,500 bp were excluded from downstream analysis.

MAG Recovery

Reads from each sample were mapped to all 30 assemblies (30×30 mappings) using Bowtie2. The resulting alignments were sorted and indexed with Samtools. Contig coverage across all samples was computed using jgi_summarize_bam_contig_depths. Binning was performed with MetaBAT 2 and SemiBin v1.3.

MAG quality was assessed with CheckM. Only high-quality MAGs (≥70% completeness, ≤5% contamination, N50 ≥ 8 kb) were retained.

Non-Redundant Gene Catalog

Gene prediction was carried out using Prodigal on all contigs from the current study (with -m -p meta). Genes shorter than 90 bp or lacking start/stop codons were discarded. The remaining genes from both sources were pooled and clustered using CD-HIT-EST (parameters: -c 0.95 -aS 0.90 -G 0 -d 0 -M 0 -T 0). The longest contigs were used to select representative genes.

MSP Recovery

Shotgun reads from the 30 samples were aligned to the non-redundant gene catalog using the Meteor suite, generating a gene abundance matrix (5.7 million genes × 30 samples). Co-abundant genes were grouped into 1,053 Metagenomic Species Pan-genomes (MSPs) using MSPminer.

Taxonomic Annotation of MSPs

MAGs representing each species were taxonomically annotated using GTDB-Tk with GTDB release r214. The resulting taxonomy was propagated to the corresponding MSPs.

Phylogenetic Tree Construction

A set of 39 universal phylogenetic marker genes was extracted from the 1,053 MSPs (or their corresponding MAGs, when available) using fetchMGs. Each marker was independently aligned using MUSCLE, and the alignments were concatenated and trimmed using trimAl (parameter: -automated1). A maximum-likelihood phylogenetic tree was constructed with FastTreeMP (parameters: -gamma -pseudo -spr -mlacc 3 -slownni).

Identifier
DOI https://doi.org/10.15454/5EJKAS
Metadata Access https://entrepot.recherche.data.gouv.fr/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.15454/5EJKAS
Provenance
Creator PLAZA ONATE, Florian ORCID logo; COULIBALY, Camille; ACHARD, Caroline; GHOZLANE, Amine ORCID logo; PONS, Nicolas; RUPPE, Etienne ORCID logo; DILE, Benoit; BOUCHER, Samuel; CHATELLIER, Stéphane; LE NORMAND, Bernadette; SAUSSET, Romain ORCID logo; LE CHATELIER, Emmanuelle ORCID logo; PETIT, Marie-Agnes ORCID logo; ESTELLE, Jordi ORCID logo; ZEMB, Olivier ORCID logo; ALMEIDA, Mathieu ORCID logo
Publisher Recherche Data Gouv
Contributor PLAZA ONATE, Florian
Publication Year 2022
Rights etalab 2.0; info:eu-repo/semantics/openAccess; https://spdx.org/licenses/etalab-2.0.html
OpenAccess true
Contact PLAZA ONATE, Florian (INRAE MetaGenoPolis)
Representation
Resource Type Dataset
Format application/x-xz; application/x-gzip; text/tab-separated-values
Size 271170908; 27710; 1168237448; 6044072; 1039434447; 1706046583; 26688728; 554262; 22825
Version 5.1
Discipline Life Sciences