High-throughput SELEX–SAGE method for quantitative modeling of transcription-factor binding sites

8 July 2002

journal article
research article
Published by Springer Nature in Nature Biotechnology

Vol. 20 (8), 831-835
https://doi.org/10.1038/nbt718

Abstract

The ability to determine the location and relative strength of all transcription-factor binding sites in a genome is important both for a comprehensive understanding of gene regulation and for effective promoter engineering in biotechnological applications. Here we present a bioinformatically driven experimental method to accurately define the DNA-binding sequence specificity of transcription factors. A generalized profile¹ was used as a predictive quantitative model for binding sites, and its parameters were estimated from in vitro–selected ligands using standard hidden Markov model training algorithms^2,3. Computer simulations showed that several thousand low- to medium-affinity sequences are required to generate a profile of desired accuracy. To produce data on this scale, we applied high-throughput genomics methods to the biochemical problem addressed here. A method combining systematic evolution of ligands by exponential enrichment (SELEX)⁴ and serial analysis of gene expression (SAGE)⁵ protocols was coupled to an automated quality-controlled sequence extraction procedure based on Phred quality scores⁶. This allowed the sequencing of a database of more than 10,000 potential DNA ligands for the CTF/NFI transcription factor. The resulting binding-site model defines the sequence specificity of this protein with a high degree of accuracy not achieved earlier and thereby makes it possible to identify previously unknown regulatory sequences in genomic DNA. A covariance analysis of the selected sites revealed non-independent base preferences at different nucleotide positions, providing insight into the binding mechanism.

Keywords

This publication has 16 references indexed in Scilit:

Non-independence of Mnt repressor-operator interaction determined by a new quantitative multiple fluorescence relative affinity (QuMFRA) assay
Nucleic Acids Research, 2001
DNA Binding Specificity of Different STAT Proteins
Journal of Biological Chemistry, 2001
Experimental analysis and computer prediction of CTF/NFI transcription factor DNA binding sites 1 1Edited by M. Yaniv
Journal of Molecular Biology, 2000
The mathematics of SELEX against complex targets
Journal of Molecular Biology, 1998
Prediction of complete gene structures in human genomic DNA
Journal of Molecular Biology, 1997
A flexible motif search technique based on generalized profiles
Computers & Chemistry, 1996
Serial Analysis of Gene Expression
Science, 1995
All you wanted to know about SELEX
Molecular Biology Reports, 1994
A quantitative analysis of nuclear factor I/DNA interactions
Nucleic Acids Research, 1988
Selection of DNA binding sites by regulatory proteins
Journal of Molecular Biology, 1987

Cited by 180 articles