MicroReset: characterization of the rabbit (Oryctolagus cuniculus) fecal metagenome and resistome by deep shotgun sequencing

Dataset

DOI

Dataset Overview

This dataset provides:

A non-redundant, high-quality gene catalog comprising 5.7 million genes 2,323 Metagenome-Assembled Genomes (MAGs) 1,053 Metagenomic Species Pan-genomes (MSPs)

It is designed to support the analysis of shotgun metagenomic data from the rabbit gut microbiota.

Materials and Methods

Data Source

The dataset was generated from 30 rabbit fecal samples subjected to deep shotgun metagenomic sequencing. The sequencing data is available under the BioProject PRJEB50625.

Metagenomic Assembly

Raw sequencing reads were first pre-processed using fastp for adapter removal and quality trimming. Host-derived reads were filtered out by mapping to the rabbit reference genome (GCF_000001635.27) using Bowtie2 and removing mapped reads with Samtools.

Each sample was individually assembled using metaSPAdes. Contigs shorter than 1,500 bp were excluded from downstream analysis.

MAG Recovery

Reads from each sample were mapped to all 30 assemblies (30×30 mappings) using Bowtie2. The resulting alignments were sorted and indexed with Samtools. Contig coverage across all samples was computed using jgi_summarize_bam_contig_depths. Binning was performed with MetaBAT 2 and SemiBin v1.3.

MAG quality was assessed with CheckM. Only high-quality MAGs (≥70% completeness, ≤5% contamination, N50 ≥ 8 kb) were retained.

Non-Redundant Gene Catalog

Gene prediction was carried out using Prodigal on all contigs from the current study (with -m -p meta). Genes shorter than 90 bp or lacking start/stop codons were discarded. The remaining genes from both sources were pooled and clustered using CD-HIT-EST (parameters: -c 0.95 -aS 0.90 -G 0 -d 0 -M 0 -T 0). The longest contigs were used to select representative genes.

MSP Recovery

Shotgun reads from the 30 samples were aligned to the non-redundant gene catalog using the Meteor suite, generating a gene abundance matrix (5.7 million genes × 30 samples). Co-abundant genes were grouped into 1,053 Metagenomic Species Pan-genomes (MSPs) using MSPminer.

Taxonomic Annotation of MSPs

MAGs representing each species were taxonomically annotated using GTDB-Tk with GTDB release r214. The resulting taxonomy was propagated to the corresponding MSPs.

Phylogenetic Tree Construction

A set of 39 universal phylogenetic marker genes was extracted from the 1,053 MSPs (or their corresponding MAGs, when available) using fetchMGs. Each marker was independently aligned using MUSCLE, and the alignments were concatenated and trimmed using trimAl (parameter: -automated1). A maximum-likelihood phylogenetic tree was constructed with FastTreeMP (parameters: -gamma -pseudo -spr -mlacc 3 -slownni).

Identifier
DOI	https://doi.org/10.15454/5EJKAS
Metadata Access	https://entrepot.recherche.data.gouv.fr/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.15454/5EJKAS

Provenance
Creator	PLAZA ONATE, Florian ; COULIBALY, Camille; ACHARD, Caroline; GHOZLANE, Amine ; PONS, Nicolas; RUPPE, Etienne ; DILE, Benoit; BOUCHER, Samuel; CHATELLIER, Stéphane; LE NORMAND, Bernadette; SAUSSET, Romain ; LE CHATELIER, Emmanuelle ; PETIT, Marie-Agnes ; ESTELLE, Jordi ; ZEMB, Olivier ; ALMEIDA, Mathieu
Publisher	Recherche Data Gouv
Contributor	PLAZA ONATE, Florian
Publication Year	2022
Rights	etalab 2.0; info:eu-repo/semantics/openAccess; https://spdx.org/licenses/etalab-2.0.html
OpenAccess	true
Contact	PLAZA ONATE, Florian (INRAE MetaGenoPolis)

Representation
Resource Type	Dataset
Format	application/x-xz; application/x-gzip; text/tab-separated-values
Size	271170908; 27710; 1168237448; 6044072; 1039434447; 1706046583; 26688728; 554262; 22825
Version	5.1
Discipline	Life Sciences