Some Statistical Problems in the Assessment of Inhomogeneities of DNA Sequence Data

Abstract
The fields of molecular genetics and medicine are accumulating DNA and protein sequence data at an accelerating rate. Discovering and interpreting sequence patterns can contribute to understanding molecular mechanisms and evolutionary processes. This article considers two types of statistical problems in these contexts: (1) identifying anomalies in the distribution of a specified biochemical marker along a DNA string; in particular, new statistical methods are set forth by which to assess excessive clustering, over dispersion, and too much regularity of the marker along the sequence. Applications are given to the physical map data of the bacterium Escherichia coli. (2) Some results and statistical problems on the assembly of cloned DNA segments are also described. Sections 2 and 3 of the article present helpful background material on DNA organization and inheritance.