PPIT: an R package for inferring microbial taxonomy from nifH sequences

Amplicon sequencing of metabolic marker genes provides targeted snapshots of community metabolic potential. However, inferring microbial taxonomy from metabolic marker gene sequences remains a challenge, particularly for the nitrogen fixation marker gene nitrogenase reductase (nifH). Here, we present Phylogenetic Placement for Inferring Taxonomy (PPIT), an R package that infers microbial taxonomy from nifH amplicons using both phylogenetic placement and sequence identity approaches. After users place query sequences on the reference nifH gene tree that PPIT provides (n = 6093 full-length nifH sequences), PPIT searches the phylogenetic neighborhood of each query sequence and attempts to infer microbial taxonomy. An inference is drawn only if references in the phylogenetic neighborhood are: (1) taxonomically consistent and (2) share sufficient pairwise identity with the query, thereby avoiding erroneous inferences due to known horizontal gene transfer events. We find that PPIT returns a higher proportion of correct taxonomic inferences than BLAST-based approaches at the cost of fewer total inferences. We demonstrate PPIT on deep-sea sediment and find that Deltaproteobacteria compose most of the potential diazotroph assemblage. We additionally discuss how users can apply PPIT to the analysis of other marker genes.

Identifier
Source https://data.blue-cloud.org/search-details?step=~012BEED253F29BF5EEC472500532C6A1A8E93B3DE90
Metadata Access https://data.blue-cloud.org/api/collections/BEED253F29BF5EEC472500532C6A1A8E93B3DE90
Provenance
Instrument Illumina MiSeq; ILLUMINA
Publisher Blue-Cloud Data Discovery & Access service; ELIXIR-ENA
Contributor STANFORD UNIVERSITY
Publication Year 2024
OpenAccess true
Contact blue-cloud-support(at)maris.nl
Representation
Discipline Marine Science
Spatial Coverage (-124.922W, 35.689S, -122.544E, 37.134N)
Temporal Coverage Begin 2017-03-01T00:00:00Z
Temporal Coverage End 2020-03-09T00:00:00Z