Theory of a Systematic Computational Error in Free Energy Differences

Abstract
Systematic inaccuracy is inherent in any computational estimate of a nonlinear average, due to the availability of only a finite number of data values, N. Free energy differences ΔF between two states or systems are critically important examples of such averages. Previous work has demonstrated, empirically, that the “finite-sampling error” can be very large—many times kBT—in ΔF estimates for simple molecular systems. Here we present a theoretical description of the inaccuracy, including the exact solution of a sample problem, the precise asymptotic behavior in terms of 1/N for large N, the identification of a universal law, and numerical illustrations. The theory relies on corrections to the central and other limit theorems.