Can Deliberately Incomplete Gene Sample Augmentation Improve a Phylogeny Estimate for the Advanced Moths and Butterflies (Hexapoda: Lepidoptera)?
Open Access
- 11 August 2011
- journal article
- research article
- Published by Oxford University Press (OUP) in Systematic Biology
- Vol. 60 (6), 782-796
- https://doi.org/10.1093/sysbio/syr079
Abstract
This paper addresses the question of whether one can economically improve the robustness of a molecular phylogeny estimate by increasing gene sampling in only a subset of taxa, without having the analysis invalidated by artifacts arising from large blocks of missing data. Our case study stems from an ongoing effort to resolve poorly understood deeper relationships in the large clade Ditrysia ( > 150,000 species) of the insect order Lepidoptera (butterflies and moths). Seeking to remedy the overall weak support for deeper divergences in an initial study based on five nuclear genes (6.6 kb) in 123 exemplars, we nearly tripled the total gene sample (to 26 genes, 18.4 kb) but only in a third (41) of the taxa. The resulting partially augmented data matrix (45% intentionally missing data) consistently increased bootstrap support for groupings previously identified in the five-gene (nearly) complete matrix, while introducing no contradictory groupings of the kind that missing data have been predicted to produce. Our results add to growing evidence that data sets differing substantially in gene and taxon sampling can often be safely and profitably combined. The strongest overall support for nodes above the family level came from including all nucleotide changes, while partitioning sites into sets undergoing mostly nonsynonymous versus mostly synonymous change. In contrast, support for the deepest node for which any persuasive molecular evidence has yet emerged (78–85% bootstrap) was weak or nonexistent unless synonymous change was entirely excluded, a result plausibly attributed to compositional heterogeneity. This node (Gelechioidea + Apoditrysia), tentatively proposed by previous authors on the basis of four morphological synapomorphies, is the first major subset of ditrysian superfamilies to receive strong statistical support in any phylogenetic study. A “more-genes-only” data set (41 taxa×26 genes) also gave strong signal for a second deep grouping (Macrolepidoptera) that was obscured, but not strongly contradicted, in more taxon-rich analyses.Keywords
This publication has 42 references indexed in Scilit:
- Comprehensive gene and taxon coverage elucidates radiation patterns in moths and butterfliesProceedings Of The Royal Society B-Biological Sciences, 2010
- Leveraging skewed transcript abundance by RNA-Seq to increase the genomic depth of the tree of lifeProceedings of the National Academy of Sciences, 2010
- Sparse Supermatrices for Phylogenetic Inference: Taxonomy, Alignment, Rogue Taxa, and the Phylogeny of Living TurtlesSystematic Biology, 2010
- Toward reconstructing the evolution of advanced moths and butterflies (Lepidoptera: Ditrysia): an initial molecular studyBMC Ecology and Evolution, 2009
- The Effect of Ambiguous Data on Phylogenetic Estimates Obtained by Maximum Likelihood and Bayesian InferenceSystematic Biology, 2009
- Inferring phylogenies with incomplete data sets: a 5-gene, 567-taxon analysis of angiospermsBMC Ecology and Evolution, 2009
- Resolving Arthropod Phylogeny: Exploring Phylogenetic Signal within 41 kb of Protein-Coding Nuclear Gene SequenceSystematic Biology, 2008
- jModelTest: Phylogenetic Model AveragingMolecular Biology and Evolution, 2008
- Using ESTs for phylogenomics: can one accurately infer a phylogenetic tree from a gappy alignment?BMC Ecology and Evolution, 2008
- bold: The Barcode of Life Data System (http://www.barcodinglife.org)Molecular Ecology Notes, 2007