Data from: Phylogenetic signal and noise: predicting the power of a data set to resolve phylogeny

A principal objective for phylogenetic experimental design is to predict the power of a dataset to resolve nodes in a phylogenetic tree. However, proactively assessing the potential for phylogenetic noise compared to signal in a candidate dataset has been a formidable challenge. Understanding the impact of collection of additional sequence data to resolve recalcitrant internodes at diverse historical times will facilitate increasingly accurate and cost-effective phylogenetic research. Here, we derive theory based on the fundamental unit of the phylogenetic tree, the quartet, that applies estimates of the state space and the rates of evolution of characters in a dataset to predict phylogenetic signal and phylogenetic noise and therefore to predict the power to resolve internodes. We develop and implement a Monte Carlo approach to estimating power to resolve as well as deriving a nearly equivalent, faster deterministic calculation. These approaches are applied to describe the distribution of potential signal, polytomy, or noise for two example datasets, one recent (CO1 and 28S sequences from Diplazontinae parasitoid wasps) and one deep (eight nuclear genes and a phylogenomic sequence for diverse microbial eukaryotes including Stramenopiles, Alveolata, and Rhizaria). The predicted power of resolution for the loci analyzed is consistent with the historic use of the genes in phylogenetics.

Identifier
DOI https://doi.org/10.5061/dryad.61cg073t
PID https://nbn-resolving.org/urn:nbn:nl:ui:13-qy-5ov0
Metadata Access https://easy.dans.knaw.nl/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=oai:easy.dans.knaw.nl:easy-dataset:81135
Provenance
Creator Townsend, Jeffrey P.; Su, Zhuo; Tekle, Yonas I.
Publisher Data Archiving and Networked Services (DANS)
Publication Year 2012
Rights info:eu-repo/semantics/openAccess; License: http://creativecommons.org/publicdomain/zero/1.0; http://creativecommons.org/publicdomain/zero/1.0
OpenAccess true
Representation
Resource Type Dataset
Discipline Life Sciences; Medicine