Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent
- 23 July 2010
- journal article
- research article
- Published by Springer Nature in Journal of Mathematical Biology
- Vol. 62 (6), 833-862
- https://doi.org/10.1007/s00285-010-0355-7
Abstract
Gene trees are evolutionary trees representing the ancestry of genes sampled from multiple populations. Species trees represent populations of individuals—each with many genes—splitting into new populations or species. The coalescent process, which models ancestry of gene copies within populations, is often used to model the probability distribution of gene trees given a fixed species tree. This multispecies coalescent model provides a framework for phylogeneticists to infer species trees from gene trees using maximum likelihood or Bayesian approaches. Because the coalescent models a branching process over time, all trees are typically assumed to be rooted in this setting. Often, however, gene trees inferred by traditional phylogenetic methods are unrooted. We investigate probabilities of unrooted gene trees under the multispecies coalescent model. We show that when there are four species with one gene sampled per species, the distribution of unrooted gene tree topologies identifies the unrooted species tree topology and some, but not all, information in the species tree edges (branch lengths). The location of the root on the species tree is not identifiable in this situation. However, for 5 or more species with one gene sampled per species, we show that the distribution of unrooted gene tree topologies identifies the rooted species tree topology and all its internal branch lengths. The length of any pendant branch leading to a leaf of the species tree is also identifiable for any species from which more than one gene is sampled.Keywords
All Related Versions
This publication has 51 references indexed in Scilit:
- Estimating trees from filtered data: Identifiability of models for morphological phylogeneticsJournal of Theoretical Biology, 2010
- Bayesian Inference of Species Trees from Multilocus DataMolecular Biology and Evolution, 2009
- Gene tree discordance, phylogenetic inference and the multispecies coalescentTrends in Ecology & Evolution, 2009
- STEM: species tree estimation using maximum likelihood for gene trees under coalescenceBioinformatics, 2009
- Properties of Consensus Methods for Inferring Species Trees from Gene TreesSystematic Biology, 2009
- Subdivision in an Ancestral Species Creates Asymmetry in Gene TreesMolecular Biology and Evolution, 2008
- Identifying evolutionary trees and substitution parameters for the general Markov model with invariable sitesMathematical Biosciences, 2008
- Discordance of Species Trees with Their Most Likely Gene TreesPLoS Genetics, 2006
- Invariants of phylogenies in a simple case with discrete statesJournal of Classification, 1987
- Evolutionary trees from DNA sequences: A maximum likelihood approachJournal of Molecular Evolution, 1981