Inferring Tree Models for Oncogenesis from Comparative Genome Hybridization Data

Abstract
Comparative genome hybridization (CGH) is a laboratory method to measure gains and losses of chromosomal regions in tumor cells. It is believed that DNA gains and losses in tumor cells do not occur entirely at random, but partly through some flow of causality. Models that relate tumor progression to the occurrence of DNA gains and losses could be very useful in hunting cancer genes and in cancer diagnosis. We lay some mathematical foundations for inferring a model of tumor progression from a CGH data set. We consider a class of tree models that are more general than a path model that has been developed for colorectal cancer. We derive a tree model inference algorithm based on the idea of a maximum-weight branching in a graph, and we show that under plausible assumptions our algorithm infers the correct tree. We have implemented our methods in software, and we illustrate with a CGH data set for renal cancer.