Sparse Supermatrices for Phylogenetic Inference: Taxonomy, Alignment, Rogue Taxa, and the Phylogeny of Living Turtles
Open Access
- 1 January 2010
- journal article
- research article
- Published by Oxford University Press (OUP) in Systematic Biology
- Vol. 59 (1), 42-58
- https://doi.org/10.1093/sysbio/syp075
Abstract
As phylogenetic data sets grow in size and number, objective methods to summarize this information are becoming increasingly important. Supermatrices can combine existing data directly and in principle provide effective syntheses of phylogenetic information that may reveal new relationships. However, several serious difficulties exist in the construction of large supermatrices that must be overcome before these approaches will enjoy broad utility. We present analyses that examine the performance of sparse supermatrices constructed from large sequence databases for the reconstruction of species-level phylogenies. We develop a largely automated informatics pipeline that allows for the construction of sparse supermatrices from GenBank data. In doing so, we develop strategies for alleviating some of the outstanding impediments to accurate phylogenetic inference using these approaches. These include taxonomic standardization, automated alignment, and the identification of rogue taxa. We use turtles as an exemplar clade and present a well-supported species-level phylogeny for two-thirds of all turtle species based on a ∼50 kb supermatrix consisting of 93% missing data. Finally, we discuss some of the remaining pitfalls and concerns associated with supermatrix analyses, provide comparisons to supertree approaches, and suggest areas for future research.Keywords
This publication has 64 references indexed in Scilit:
- The Effect of Ambiguous Data on Phylogenetic Estimates Obtained by Maximum Likelihood and Bayesian InferenceSystematic Biology, 2009
- Assessing what is needed to resolve a molecular phylogeny: simulations and empirical data from emydid turtlesBMC Ecology and Evolution, 2009
- IS A NEW AND GENERAL THEORY OF MOLECULAR SYSTEMATICS EMERGING?Evolution, 2009
- The PhyLoTA Browser: Processing GenBank for Molecular Phylogenetics ResearchSystematic Biology, 2008
- Genomics, biogeography, and the diversification of placental mammalsProceedings of the National Academy of Sciences, 2007
- The delayed rise of present-day mammalsNature, 2007
- RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed modelsBioinformatics, 2006
- Discordance of Species Trees with Their Most Likely Gene TreesPLoS Genetics, 2006
- MUSCLE: multiple sequence alignment with high accuracy and high throughputNucleic Acids Research, 2004