Algorithms for Extracting Structured Motifs Using a Suffix Tree with an Application to Promoter and Regulatory Site Consensus Identification

Top Cited Papers

1 August 2000

journal article
research article
Published by Mary Ann Liebert Inc in Journal of Computational Biology

Vol. 7 (3-4), 345-362
https://doi.org/10.1089/106652700750050826

Abstract

This paper introduces two exact algorithms for extracting conserved structured motifs from a set of DNA sequences. Structured motifs may be described as an ordered collection of p ≥ 1 "boxes" (each box corresponding to one part of the structured motif), p substitution rates (one for each box) and p - 1 intervals of distance (one for each pair of successive boxes in the collection). The contents of the boxes - that is, the motifs themselves - are unknown at the start of the algorithm. This is precisely what the algorithms are meant to find. A suffix tree is used for finding such motifs. The algorithms are efficient enough to be able to infer site consensi, such as, for instance, promoter sequences or regulatory sites, from a set of unaligned sequences corresponding to the noncoding regions upstream from all genes of a genome. In particular, both algorithms time complexity scales linearly with N²n where n is the average length of the sequences and N their number. An application to the identification of promoter and regulatory consensus sequences in bacterial genomes is shown.

Keywords

This publication has 22 references indexed in Scilit:

Functional promoter modules can be detected by formal models independent of overall nucleotide sequence similarity.
Bioinformatics, 1999
Predicting Gene Regulatory Elements in Silico on a Genomic Scale
Genome Research, 1998
Approaches to the Automatic Discovery of Patterns in Biosequences
Journal of Computational Biology, 1998
Mycobacterial promoters
Tubercle and Lung Disease, 1997
Compilation and analysus ofBacillus Subtilisσ^A-dependent promoter sequences: evidence for extended contact between RNA polymerse and upstream promoter DNA
Nucleic Acids Research, 1995
Expectation maximization algorithm for identifying protein-binding sites with variable lengths from unaligned DNA fragments
Journal of Molecular Biology, 1992
Selection of DNA binding sites by regulatory proteins
Journal of Molecular Biology, 1988
Rigorous pattern-recognition methods for DNA sequences
Journal of Molecular Biology, 1985
Cyclic AMP Receptor Protein: Role in Transcription Activation
Science, 1984
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM, 1976

Cited by 178 articles