Segmentation algorithm for DNA sequences
- 17 October 2005
- journal article
- research article
- Published by American Physical Society (APS) in Physical Review E
- Vol. 72 (4), 041917
- https://doi.org/10.1103/physreve.72.041917
Abstract
A new measure, to quantify the difference between two probability distributions, called the quadratic divergence, has been proposed. Based on the quadratic divergence, a new segmentation algorithm to partition a given genome or DNA sequence into compositionally distinct domains is put forward. The new algorithm has been applied to segment the 24 human chromosome sequences, and the boundaries of isochores for each chromosome were obtained. Compared with the results obtained by using the entropic segmentation algorithm based on the Jensen-Shannon divergence, both algorithms resulted in all identical coordinates of segmentation points. An explanation of the equivalence of the two segmentation algorithms is presented. The new algorithm has a number of advantages. Particularly, it is much simpler and faster than the entropy-based method. Therefore, the new algorithm is more suitable for analyzing long genome sequences, such as human and other newly sequenced eukaryotic genome sequences.Keywords
This publication has 21 references indexed in Scilit:
- Identification of replication origins in archaeal genomes based on theZ-curve methodArchaea, 2005
- A systematic method to identify genomic islands and its applications in analyzing the genomes of Corynebacterium glutamicum and Vibrio vulnificus CMCP6 chromosome IBioinformatics, 2004
- Isochore chromosome maps of the human genomeGene, 2002
- Isochore chromosome maps of eukaryotic genomesGene, 2001
- Isochores and the evolutionary genomics of vertebratesGene, 1999
- Segmentation of yeast DNA using hidden Markov modelsBioinformatics, 1999
- A simple vectorial representation of DNA sequences for the detection of replication origins in bacteriaBiochimie, 1996
- Compositional segmentation and long-range fractal correlations in DNA sequencesPhysical Review E, 1996
- Base compositional structure of genomesGenomics, 1992
- Hidden Markov chains and the analysis of genome structureComputers & Chemistry, 1992