Identifying protein-binding sites from unaligned DNA fragments.
- 1 February 1989
- journal article
- research article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 86 (4), 1183-1187
- https://doi.org/10.1073/pnas.86.4.1183
Abstract
The ability to determine important features within DNA sequences from the sequences alone is becoming essential as large-scale sequencing projects are being undertaken. We present a method that can be applied to the problem of identifying the recognition pattern for a DNA-binding protein given only a collection of sequenced DNA fragments, each known to contain somewhere within it a binding site for that protein. Information about the position or orientation of the binding sites within those fragments is not needed. The method compares the "information content" of a large number of possible binding site alignments to arrive at a matrix representation of the binding site pattern. The specificity of the protein is represented as a matrix, rather than a consensus sequence, allowing patterns that are typical of regulatory protein-binding sites to be identified. The reliability of the method improves as the number of sequences increases, but the time required increases only linearly with the number of sequences. An example, using known cAMP receptor protein-binding sites, illustrates the method.This publication has 18 references indexed in Scilit:
- Computer Methods for Analyzing Sequence Recognition of Nucleic AcidsAnnual Review of Biophysics, 1988
- Multiple sequence alignmentJournal of Molecular Biology, 1986
- Information content of binding sites on nucleotide sequencesJournal of Molecular Biology, 1986
- Rigorous pattern-recognition methods for DNA sequencesJournal of Molecular Biology, 1985
- Molecular cloning and expression of the biodegradative threonine dehydratase gene (tdc) of Escherichia coli K12Molecular Genetics and Genomics, 1985
- Cyclic AMP Receptor Protein: Role in Transcription ActivationScience, 1984
- Escherichia colipromoter sequences predictin vitroRNA polymerase selectivityNucleic Acids Research, 1984
- Computer methods to locate signals in nucleic acid sequencesNucleic Acids Research, 1984
- A perfectly symmetric lac operator binds the lac repressor very tightly.Proceedings of the National Academy of Sciences, 1983
- Compilation and analysis ofEscherichia colipromoter DNA sequencesNucleic Acids Research, 1983