Abstract
Empirical scoring functions provide estimates of the free energy of protein−ligand binding in situations when atomic-scale simulations are intractable, for example, in virtual high-throughput screening. Currently, such scoring functions are often inaccurate, and further improvements are complicated by the lack of reliable training data, the complex interplay between scoring functions and docking algorithms, and an inconsistent statistical treatment of positive and negative training data. In comparison to various other performance measures of scoring functions, “analysis of variance” provides a well-behaved objective function for optimization, which focuses on the signal-to-noise ratio of ligand−decoy discrimination. In combination with a large database of ligands and decoys, an in situ optimization of scoring function parameters was able to generate improved, target-specific scoring functions for three different proteins of pharmaceutical interest: cyclin-dependent kinase 2, the estrogen receptor, and cyclooxygenase-2. Statistical analysis of the improvements observed in “receiver-operating characteristic” curves showed that the optimized scoring functions achieved a significantly (between p < 0.0001 and p < 0.05) higher enrichment of true ligands. A scaffold dependence of the resulting binding modes was observed, which is discussed in conjunction with the rigid receptor hypothesis commonly made in protein−ligand docking. In summary, the approach described here represents a well-adapted statistical method for setting up scoring functions.