Real-Time Definition of Non-Randomness in the Distribution of Genomic Events
Open Access
- 27 June 2007
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLOS ONE
- Vol. 2 (6), e570
- https://doi.org/10.1371/journal.pone.0000570
Abstract
Features such as mutations or structural characteristics can be non-randomly or non-uniformly distributed within a genome. So far, computer simulations were required for statistical inferences on the distribution of sequence motifs. Here, we show that these analyses are possible using an analytical, mathematical approach. For the assessment of non-randomness, our calculations only require information including genome size, number of (sampled) sequence motifs and distance parameters. We have developed computer programs evaluating our analytical formulas for the real-time determination of expected values and p-values. This approach permits a flexible cluster definition that can be applied to most effectively identify non-random or non-uniform sequence motif distribution. As an example, we show the effectivity and reliability of our mathematical approach in clinical retroviral vector integration site distribution.Keywords
This publication has 35 references indexed in Scilit:
- A Pattern-Based Method for the Identification of MicroRNA Binding Sites and Their Corresponding HeteroduplexesCell, 2006
- Initial sequencing and comparative analysis of the mouse genomeNature, 2002
- The Genome Sequence of the Malaria Mosquito Anopheles gambiaeScience, 2002
- Genome-wide retroviral insertional tagging of genes involved in cancer in Cdkn2a-deficient miceNature Genetics, 2002
- High-throughput retroviral tagging to identify components of specific signaling pathways in cancerNature Genetics, 2002
- New genes involved in cancer identified by retroviral taggingNature Genetics, 2002
- Murine Leukemia Induced by Retroviral Gene MarkingScience, 2002
- The contribution of 700,000 ORF sequence tags to the definition of the human transcriptomeProceedings of the National Academy of Sciences, 2001
- Initial sequencing and analysis of the human genomeNature, 2001
- The Genome Sequence of Drosophila melanogasterScience, 2000