Theory of Systematic Computational Error in Free Energy Differences

Abstract
Systematic inaccuracy is inherent in any computational estimate of a non-linear average, due to the availability of only a finite number of data values, N. Free energy differences (DF) between two states or systems are critically important examples of such averages in physical, chemical and biological settings. Previous work has demonstrated, empirically, that the ``finite-sampling error'' can be very large -- many times kT -- in DF estimates for simple molecular systems. Here, we present a theoretical description of the inaccuracy, including the exact solution of a sample problem, the precise asymptotic behavior in terms of 1/N for large N, the identification of universal law, and numerical illustrations. The theory relies on corrections to the central and other limit theorems, and thus a role is played by stable (Levy) probability distributions.