Toucan: deciphering the cis-regulatory logic of coregulated genes
- 15 March 2003
- journal article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 31 (6), 1753-1764
- https://doi.org/10.1093/nar/gkg268
Abstract
TOUCAN is a Java application for the rapid discovery of significant cis-regulatory elements from sets of coexpressed or coregulated genes. Biologists can automatically (i) retrieve genes and intergenic regions, (ii) identify putative regulatory regions, (iii) score sequences for known transcription factor binding sites, (iv) identify candidate motifs for unknown binding sites, and (v) detect those statistically over-represented sites that are characteristic for a gene set. Genes or intergenic regions are retrieved from Ensembl or EMBL, together with orthologs and supporting information. Orthologs are aligned and syntenic regions are selected as candidate regulatory regions. Putative sites for known transcription factors are detected using our MotifScanner, which scores position weight matrices using a probabilistic model. New motifs are detected using our MotifSampler based on Gibbs sampling. Binding sites characteristic for a gene set--and thus statistically over-represented with respect to a reference sequence set--are found using a binomial test. We have validated Toucan by analyzing muscle-specific genes, liver-specific genes and E2F target genes; we have easily detected many known binding sites within intergenic DNA and identified new biologically plausible sites for known and unknown transcription factors. Software available at http://www.esat.kuleuven.ac. be/ approximately dna/BioI/Software.html.Keywords
This publication has 37 references indexed in Scilit:
- Creating a bioinformatics nationNature, 2002
- Computational Detection and Location of Transcription Start Sites in Mammalian Genomic DNAGenome Research, 2002
- Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genomeProceedings of the National Academy of Sciences, 2002
- Human-mouse genome comparisons to locate regulatory sitesNature Genetics, 2000
- PipMaker—A Web Server for Aligning Two Genomic DNA SequencesGenome Research, 2000
- Computational identification of Cis -regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae 1 1Edited by F. E. CohenJournal of Molecular Biology, 2000
- Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: a novel context analysis approachJournal of Molecular Biology, 2000
- Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies 1 1Edited by G. von HeijneJournal of Molecular Biology, 1998
- Identification of regulatory regions which confer muscle-specific gene expressionJournal of Molecular Biology, 1998
- Sequence logos: a new way to display consensus sequencesNucleic Acids Research, 1990