Short blocks from the noncoding parts of the human genome have instances within nearly all known genes and relate to biological processes
- 25 April 2006
- journal article
- research article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 103 (17), 6605-6610
- https://doi.org/10.1073/pnas.0601688103
Abstract
Using an unsupervised pattern-discovery method, we processed the human intergenic and intronic regions and catalogued all variable-length patterns with identically conserved copies and multiplicities above what is expected by chance. Among the millions of discovered patterns, we found a subset of 127,998 patterns, termed pyknons, which have additional nonoverlapping instances in the untranslated and protein-coding regions of 30,675 transcripts from 20,059 human genes. The pyknons arrange combinatorially in the untranslated and coding regions of numerous human genes where they form mosaics. Consecutive instances of pyknons in these regions show a strong bias in their relative placement, favoring distances of ≈22 nucleotides. We also found pyknons to be enriched in a statistically significant manner in genes involved in specific processes, e.g., cell communication, transcription, regulation of transcription, signaling, transport, etc. For ≈1/3 of the pyknons, the intergenic/intronic instances of their reverse complement lie within 380,084 nonoverlapping regions, typically 60–80 nucleotides long, which are predicted to form double-stranded, energetically stable, hairpin-shaped RNA secondary structures; additionally, the pyknons subsume ≈40% of the known microRNA sequences, thus suggesting a possible link with posttranscriptional gene silencing and RNA interference. Cross-genome comparisons reveal that many of the pyknons have instances in the 3′ UTRs of genes from other vertebrates and invertebrates where they are overrepresented in similar biological processes, as in the human genome. These unexpected findings suggest potential unique functional connections between the coding and noncoding parts of the human genome.Keywords
This publication has 45 references indexed in Scilit:
- Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammalsNature, 2005
- Highly Conserved Non-Coding Sequences Are Associated with Vertebrate DevelopmentPLoS Biology, 2004
- A pancreatic islet-specific microRNA regulates insulin secretionNature, 2004
- RNA regulation: a new genetics?Nature Reviews Genetics, 2004
- Applied bioinformatics for the identification of regulatory elementsNature Reviews Genetics, 2004
- Sequencing and comparison of yeast species to identify genes and regulatory elementsNature, 2003
- An Abundant Class of Tiny RNAs with Probable Regulatory Roles in Caenorhabditis elegansScience, 2001
- The Cold Shock Domain Protein LIN-28 Controls Developmental Timing in C. elegans and Is Regulated by the lin-4 RNACell, 1997
- Statistics of local complexity in amino acid sequences and sequence databasesComputers & Chemistry, 1993
- Basic Local Alignment Search ToolJournal of Molecular Biology, 1990