Solvation free energies of amino acid side chain analogs for common molecular mechanics water models

Abstract
Quantitative free energy computation involves both using a model that is sufficiently faithful to the experimental system under study (accuracy) and establishing statistically meaningful measures of the uncertainties resulting from finite sampling (precision). In order to examine the accuracy of a range of common water models used for protein simulation for their solute/solvent properties, we calculate the free energy of hydration of 15 amino acid side chain analogs derived from the OPLS-AA parameter set with the TIP3P, TIP4P, SPC, SPC/E, TIP3P-MOD, and TIP4P-Ew water models. We achieve a high degree of statistical precision in our simulations, obtaining uncertainties for the free energy of hydration of 0.02-0.06 kcal/mol, equivalent to that obtained in experimental hydration free energy measurements of the same molecules. We find that TIP3P-MOD, a model designed to give improved free energy of hydration for methane, gives uniformly the closest match to experiment; we also find that the ability to accurately model pure water properties does not necessarily predict ability to predict solute/solvent behavior. We also evaluate the free energies of a number of novel modifications of TIP3P designed as a proof of concept that it is possible to obtain much better solute/solvent free energetic behavior without substantially negatively affecting pure water properties. We decrease the average error to zero while reducing the root mean square error below that of any of the published water models, with measured liquid water properties remaining almost constant with respect to our perturbations. This demonstrates there is still both room for improvement within current fixed-charge biomolecular force fields and significant parameter flexibility to make these improvements. Recent research in computational efficiency of free energy methods allows us to perform simulations on a local cluster that previously required large scale distributed computing, performing four times as much computational work in approximately a tenth of the computer time as a similar study a year ago.