Endogeneity in Nonparametric and Semiparametric Regression Models

Abstract
INTRODUCTION The analysis of data with endogenous regressors – that is, observable explanatory variables that are correlated with unobservable error terms – is arguably the main contribution of econometrics to statistical science. Although “endogeneity” can arise from a number of different sources, including mismeasured regressors, sample selection, heterogeneous treatment effects, and correlated random effects in panel data, the term originally arose in the context of “simultaneity,” in which the explanatory variables were, with the dependent variable, determined through a system of equations, so that their correlation with error terms arose from feedback from the dependent to the explanatory variables. Analysis of linear supply-and-demand systems (with normal errors) yielded the familiar rank and order conditions for identification, two- and three-stage estimation methods, and analysis of structural interventions. Although these multistep estimation procedures have been extended to nonlinear parametric models with additive nonnormal errors (e.g., Amemiya, 1974 and Hansen 1982), extensions to nonparametric and semiparametric models have only recently been considered. The aim of this chapter is to examine the existing literature on estimation of some “nonparametric” models with endogenous explanatory variables, and to compare the different identifying assumptions and estimation approaches for particular models and determine their applicability to others. To maintain a manageable scope for the chapter, we restrict our attention to nonparametric and semiparametric extensions of the usual simultaneous equations models (with endogenous regressors that are continuously distributed). We consider the identification and estimation of the “average structural function” and argue that this parameter is one parameter of central interest in the analysis of semi-parametric and nonparametric models with endogenous regressors.