Data from: Phylogenomic incongruence, hypothesis testing, and taxonomic sampling: the monophyly of characiform fishes

Phylogenomic studies using genome‐wide datasets are quickly becoming the state of the art for systematics and comparative studies, but in many cases, they result in strongly supported incongruent results. The extent to which this conflict is real depends on different sources of error potentially affecting big datasets (assembly, stochastic, and systematic error). Here, we apply a recently developed methodology (GGI or gene genealogy interrogation) and data curation to new and published datasets with more than 1000 exons, 500 ultraconserved element (UCE) loci, and transcriptomic sequences that support incongruent hypotheses. The contentious non‐monophyly of the order Characiformes proposed by two studies is shown to be a spurious outcome induced by sample contamination in the transcriptomic dataset and an ambiguous result due to poor taxonomic sampling in the UCE dataset. By exploring the effects of number of taxa and loci used for analysis, we show that the power of GGI to discriminate among competing hypotheses is diminished by limited taxonomic sampling, but not equally sensitive to gene sampling. Taken together, our results reinforce the notion that merely increasing the number of genetic loci for a few representative taxa is not a robust strategy to advance phylogenetic knowledge of recalcitrant groups. We leverage the expanded exon capture dataset generated here for Characiformes (206 species in 23 out of 24 families) to produce a comprehensive phylogeny and a revised classification of the order.

Identifier
DOI https://doi.org/10.5061/dryad.vb76b45
PID https://nbn-resolving.org/urn:nbn:nl:ui:13-rr-lnxq
Metadata Access https://easy.dans.knaw.nl/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=oai:easy.dans.knaw.nl:easy-dataset:116411
Provenance
Creator Betancur-R., Ricardo; Arcila, Dahiana; Vari, Richard P.; Hughes, Lily C.; Oliveira, Claudio; Sabaj, Mark H.; Ortí, Guillermo
Publisher Data Archiving and Networked Services (DANS)
Publication Year 2019
Rights info:eu-repo/semantics/openAccess; License: http://creativecommons.org/publicdomain/zero/1.0; http://creativecommons.org/publicdomain/zero/1.0
OpenAccess true
Representation
Resource Type Dataset
Discipline Life Sciences; Medicine