Multiple structural alignment for distantly related all structures using TOPS pattern discovery and simulated annealing

Abstract
Topsalign is a method that will structurally align diverse protein structures, for example, structural alignment of protein superfolds. All proteins within a superfold share the same fold but often have very low sequence identity and different biological and biochemical functions. There is often significant structural diversity around the common scaffold of secondary structure elements of the fold. Topsalign uses topological descriptions of proteins. A pattern discovery algorithm identifies equivalent secondary structure elements between a set of proteins and these are used to produce an initial multiple structure alignment. Simulated annealing is used to optimize the alignment. The output of Topsalign is a multiple structure‐based sequence alignment and a 3D superposition of the structures. This method has been tested on three superfolds: the β jelly roll, TIM (α/β) barrel and the OB fold. Topsalign outperforms established methods on very diverse structures. Despite the pattern discovery working only on β strand secondary structure elements, Topsalign is shown to align TIM (α/β) barrel superfamilies, which contain both α helices and β strands.