Finding Borders between Coding and Noncoding DNA Regions by an Entropic Segmentation Method
- 7 August 2000
- journal article
- research article
- Published by American Physical Society (APS) in Physical Review Letters
- Vol. 85 (6), 1342-1345
- https://doi.org/10.1103/physrevlett.85.1342
Abstract
We present a new computational approach to finding borders between coding and noncoding DNA. This approach has two features: (i) DNA sequences are described by a 12-letter alphabet that captures the differential base composition at each codon position, and (ii) the search for the borders is carried out by means of an entropic segmentation method which uses only the general statistical properties of coding DNA. We find that this method is highly accurate in finding borders between coding and noncoding regions and requires no “prior training” on known data sets. Our results appear to be more accurate than those obtained with moving windows in the discrimination of coding from noncoding DNA.Keywords
This publication has 19 references indexed in Scilit:
- The DNA sequence of human chromosome 22Nature, 1999
- Sequence Compositional Complexity of DNA through an Entropic Segmentation MethodPhysical Review Letters, 1998
- Genomics: Structural and Functional Studies of GenomesGenomics, 1997
- Evaluation of Gene Structure Prediction ProgramsGenomics, 1996
- Compositional segmentation and long-range fractal correlations in DNA sequencesPhysical Review E, 1996
- Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple AlignmentScience, 1993
- Identifying protein-binding sites from unaligned DNA fragments.Proceedings of the National Academy of Sciences, 1989
- Codon preference and its use in identifying protein coding regions in long DNA sequencesNucleic Acids Research, 1982
- Method to determine the reading frame of a protein from the purine/pyrimidine genome sequence and its possible evolutionary justification.Proceedings of the National Academy of Sciences, 1981
- Codon catalog usage is a genome strategy modulated for gene expressivityNucleic Acids Research, 1981