Improved molecular replacement by density- and energy-guided protein structure optimization

Abstract
With more than 60,000 structures now available in the Protein Data Bank, it is frequently possible to create homology-based models to help solve the X-ray crystal structure of a protein with an unknown three-dimensional structure. But current techniques usually fail when the protein of interest has less than 30% sequence identity to known structures. A new method able to overcome this limitation has been developed and used successfully in 8 of 13 X-ray diffraction datasets that could not be solved by conventional means. The new method should allow rapid structure determination without experimental phase information for more than half the cases in which current methods fail, as long as resolution is 3.2 Å or better, with four or fewer copies in the asymmetric unit and the availability of structures of homologous proteins with more than 20% sequence identity. Molecular replacement1,2,3,4 procedures, which search for placements of a starting model within the crystallographic unit cell that best account for the measured diffraction amplitudes, followed by automatic chain tracing methods5,6,7,8, have allowed the rapid solution of large numbers of protein crystal structures. Despite extensive work9,10,11,12,13,14, molecular replacement or the subsequent rebuilding usually fail with more divergent starting models based on remote homologues with less than 30% sequence identity. Here we show that this limitation can be substantially reduced by combining algorithms for protein structure modelling with those developed for crystallographic structure determination. An approach integrating Rosetta structure modelling with Autobuild chain tracing yielded high-resolution structures for 8 of 13 X-ray diffraction data sets that could not be solved in the laboratories of expert crystallographers and that remained unsolved after application of an extensive array of alternative approaches. We estimate that the new method should allow rapid structure determination without experimental phase information for over half the cases where current methods fail, given diffraction data sets of better than 3.2 Å resolution, four or fewer copies in the asymmetric unit, and the availability of structures of homologous proteins with >20% sequence identity.