The significance of protein sequence similarities

Abstract
A general method of assessing the significance of scored best local alignments, particularly suited to protein sequence comparisons, is described. The method establishes the parameters describing the distribution of the best results from any search program, provided that the set is sufficiently large and the majority of the alignments arise from unrelated sequences. The expected frequency of occurrence of any score can then be calculated, together with the number of standard deviations above expectation. These provide sensible measures of significance without additional search operations. However the biological significance of any alignment or set of alignments does not solely depend on the improbability of the alignment, but on all relevant factors known to the biologist.