The NMR restraints grid at BMRB for 5,266 protein and nucleic acid PDB entries

Abstract
Several pilot experiments have indicated that improvements in older NMR structures can be expected by applying modern software and new protocols (Nabuurs et al. in Proteins 55:483–186, 2004 ; Nederveen et al. in Proteins 59:662–672, 2005 ; Saccenti and Rosato in J Biomol NMR 40:251–261, 2008 ). A recent large scale X-ray study also has shown that modern software can significantly improve the quality of X-ray structures that were deposited more than a few years ago (Joosten et al. in J. Appl Crystallogr 42:376–384, 2009 ; Sanderson in Nature 459:1038–1039, 2009 ). Recalculation of three-dimensional coordinates requires that the original experimental data are available and complete, and are semantically and syntactically correct, or are at least correct enough to be reconstructed. For multiple reasons, including a lack of standards, the heterogeneity of the experimental data and the many NMR experiment types, it has not been practical to parse a large proportion of the originally deposited NMR experimental data files related to protein NMR structures. This has made impractical the automatic recalculation, and thus improvement, of the three dimensional coordinates of these structures. We here describe a large-scale international collaborative effort to make all deposited experimental NMR data semantically and syntactically homogeneous, and thus useful for further research. A total of 4,014 out of 5,266 entries were ‘cleaned’ in this process. For 1,387 entries, human intervention was needed. Continuous efforts in automating the parsing of both old, and newly deposited files is steadily decreasing this fraction. The cleaned data files are available from the NMR restraints grid at http://​restraintsgrid.​bmrb.​wisc.​edu .