Efficient computation of the phylogenetic likelihood function on multi-gene alignments and multi-core architectures

Open Access

7 October 2008

journal article
Published by The Royal Society in Philosophical Transactions Of The Royal Society B-Biological Sciences

Vol. 363 (1512), 3977-3984
https://doi.org/10.1098/rstb.2008.0163

Abstract

The continuous accumulation of sequence data, for example, due to novel wet-laboratory techniques such as pyrosequencing, coupled with the increasing popularity of multi-gene phylogenies and emerging multi-core processor architectures that face problems of cache congestion, poses new challenges with respect to the efficient computation of the phylogenetic maximum-likelihood (ML) function. Here, we propose two approaches that can significantly speed up likelihood computations that typically represent over 95 per cent of the computational effort conducted by current ML or Bayesian inference programs. Initially, we present a method and an appropriate data structure to efficiently compute the likelihood score on ‘gappy’ multi-gene alignments. By ‘gappy’ we denote sampling-induced gaps owing to missing sequences in individual genes (partitions), i.e. not real alignment gaps. A first proof-of-concept implementation in RAxML indicates that this approach can accelerate inferences on large and gappy alignments by approximately one order of magnitude. Moreover, we present insights and initial performance results on multi-core architectures obtained during the transition from an OpenMP-based to a Pthreads-based fine-grained parallelization of the ML function.

Keywords

This publication has 23 references indexed in Scilit:

Broad phylogenomic sampling improves resolution of the animal tree of life
Nature, 2008
Runtime scheduling of dynamic parallelism on accelerator-based multi-core systems
Parallel Computing, 2007
Exploring New Search Algorithms and Hardware for Phylogenetics: RAxML Meets the IBM Cell
Journal of Signal Processing Systems, 2007
PAML 4: Phylogenetic Analysis by Maximum Likelihood
Molecular Biology and Evolution, 2007
The delayed rise of present-day mammals
Nature, 2007
RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models
Bioinformatics, 2006
pIQPNNI: parallel reconstruction of large maximum likelihood phylogenies
Bioinformatics, 2005
fastDNAml: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihood
Bioinformatics, 1994
Confidence Limits on Phylogenies: An Approach Using the Bootstrap
Evolution, 1985
Evolutionary trees from DNA sequences: A maximum likelihood approach
Journal of Molecular Evolution, 1981

Cited by 82 articles