Log-Linear Analysis of Censored Survival Data with Partially Observed Covariates

Abstract
Log-linear models provide a flexible means of extending life table techniques for the analysis of censored survival data with categorical covariates, as discussed by Holford (1980) and Laird and Olivier (1981). We extend this methodology to incorporate cases in which one or more of the categorical covariates are sometimes missing. Maximum likelihood estimates of the parameters are calculated using data from all cases. This can result in large gains in efficiency over standard methods that require the exclusion of cases with incomplete data. With this approach, we assume that the hazard function, conditional on the covariates, is a stepwise function over disjoint intervals of time. The model has two parts: a log-linear model describing the hazard parameters, and a multinomial model describing the probabilities in the contingency table defined by the covariates. The main interest is in the model for the hazard parameters. We show how to calculate maximum likelihood estimates of parameters of the model either by an application of the EM algorithm in conjunction with one cycle of iterative proportional fitting in the M step or by using the Newton—Raphson algorithm. Estimates of standard errors are computed from the empirical information matrix. When using our proposed maximum likelihood approach, two additional assumptions are needed in addition to the usual assumptions of noninformative censoring. First, the mechanism causing missing covariates must be ignorable (Rubin 1976) in that the probability that a covariate is missing cannot depend on the covariate itself or on other covariates that are missing. The second assumption is that the distribution of the random censoring variable does not depend on any covariate that is missing. The first example, investigating the influence of several covariates on time to diagnosis of high blood pressure in a large cohort of men, shows clear gains in efficiency of our approach over analysis of complete cases and illustrates the flexibility of the log-linear approach. A second example of survival times of symptomatic and asymptomatic lymphoma patients shows interesting differences between the complete-case analysis and the maximum likelihood approach, which could be due to a nonrandom missing-value mechanism.