Algorithms for Extracting Structured Motifs Using a Suffix Tree with an Application to Promoter and Regulatory Site Consensus Identification
Top Cited Papers
- 1 August 2000
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 7 (3-4), 345-362
- https://doi.org/10.1089/106652700750050826
Abstract
This paper introduces two exact algorithms for extracting conserved structured motifs from a set of DNA sequences. Structured motifs may be described as an ordered collection of p ≥ 1 "boxes" (each box corresponding to one part of the structured motif), p substitution rates (one for each box) and p - 1 intervals of distance (one for each pair of successive boxes in the collection). The contents of the boxes - that is, the motifs themselves - are unknown at the start of the algorithm. This is precisely what the algorithms are meant to find. A suffix tree is used for finding such motifs. The algorithms are efficient enough to be able to infer site consensi, such as, for instance, promoter sequences or regulatory sites, from a set of unaligned sequences corresponding to the noncoding regions upstream from all genes of a genome. In particular, both algorithms time complexity scales linearly with N2n where n is the average length of the sequences and N their number. An application to the identification of promoter and regulatory consensus sequences in bacterial genomes is shown.Keywords
This publication has 22 references indexed in Scilit:
- Functional promoter modules can be detected by formal models independent of overall nucleotide sequence similarity.Bioinformatics, 1999
- Predicting Gene Regulatory Elements in Silico on a Genomic ScaleGenome Research, 1998
- Approaches to the Automatic Discovery of Patterns in BiosequencesJournal of Computational Biology, 1998
- Mycobacterial promotersTubercle and Lung Disease, 1997
- Compilation and analysus ofBacillus SubtilisσA-dependent promoter sequences: evidence for extended contact between RNA polymerse and upstream promoter DNANucleic Acids Research, 1995
- Expectation maximization algorithm for identifying protein-binding sites with variable lengths from unaligned DNA fragmentsJournal of Molecular Biology, 1992
- Selection of DNA binding sites by regulatory proteinsJournal of Molecular Biology, 1988
- Rigorous pattern-recognition methods for DNA sequencesJournal of Molecular Biology, 1985
- Cyclic AMP Receptor Protein: Role in Transcription ActivationScience, 1984
- A Space-Economical Suffix Tree Construction AlgorithmJournal of the ACM, 1976