The Infinitely-Many-Sites Model as a Measure-Valued Diffusion

Abstract
The infinitely-many-sites model (with no recombination) is reformulated, with sites labelled by elements of [0, 1] and "type" space $E = \lbrack 0, 1\rbrack^{\mathbb{Z}_+}$. A gene is of type $\mathbf{x} = (x_0, x_1,\ldots) \in E$ if $x_0, x_1, \ldots$ is the sequence of sites at which mutations have occurred in the line of descent of that gene. The model is approximated by a diffusion process taking values in $\mathscr{P}^0_a(E)$, the set of purely atomic Borel probability measures $\mu$ on $E$ with the property that the locations of every $n \geq 1$ atoms of $\mu$ form a family tree, and the diffusion is shown to have a unique stationary distribution $\tilde{\mu}$. The principal object of investigation is the $\tilde{\mu}(d\mu)$-expectation of the probability that a random sample from a population with types distributed according to $\mu$ has a given tree structure. Ewens' (1972) sampling formula and Watterson's (1975) segregating-sites distribution are obtained as corollaries.