Context-dependent optimal substitution matrices

Abstract
Substitution matrices are a key tool in important applications such as identifying sequence homologies, creating sequence alignments and more recently using evolutionary patterns for the prediction of protein structure. We have derived a novel approach to the derivation of these matrices that utilizes not only multiple sequence alignments, but also the associated evolutionary trees. The key to our method is the use of a Bayesian formalism to calculate the probability that a given substitution matrix fits the tree structures and multiple sequence alignment data. Using this procedure, we can determine optimal substitution matrices for various local environments, depending on parameters such as secondary structure and surface accessibility.