Deletion/Substitution/Addition Algorithm in Learning with Applications in Genomics

Abstract
Van der Laan and Dudoit (2003) provide a road map for estimation and performance assessment where a parameter of interest is defined as the risk minimizer for a suitable loss function and candidate estimators are generated using a loss function. After briefly reviewing this approach, this article proposes a general deletion/substitution/addition algorithm for minimizing, over subsets of variables (e.g., basis functions), the empirical risk of subset-specific estimators of the parameter of interest. This algorithm provides us with a new class of loss-based cross-validated algorithms in prediction of univariate outcomes, which can be extended to handle multivariate outcomes, conditional density and hazard estimation, and censored outcomes such as survival. In the context of regression, using polynomial basis functions, we study the properties of the deletion/substitution/addition algorithm in simulations and apply the method to detect transcription factor binding sites in yeast gene expression experiments.