Abstract
Two essential aspects of virtual screening are considered: experimental design and performance metrics. In the design of any retrospective virtual screen, choices have to be made as to the purpose of the exercise. Is the goal to compare methods? Is the interest in a particular type of target or all targets? Are we simulating a ‘real-world’ setting, or teasing out distinguishing features of a method? What are the confidence limits for the results? What should be reported in a publication? In particular, what criteria should be used to decide between different performance metrics? Comparing the field of molecular modeling to other endeavors, such as medical statistics, criminology, or computer hardware evaluation indicates some clear directions. Taken together these suggest the modeling field has a long way to go to provide effective assessment of its approaches, either to itself or to a broader audience, but that there are no technical reasons why progress cannot be made.