Organizational characteristics and information content of an archaeal genome: 156kb of sequence from Sulfolobus solfataricus P2

Abstract
We have initiated a project to sequence the 3Mbp genome of the thermoacidophilic archaebacterium Sulfolobus solfataricus P2. Cosmids were selected from a provisional set of minimally overlapping clones, subcloned in pUC18, and sequenced using a hybrid (random plus directed) strategy to give two blocks of contiguous unique sequence, respectively, 100389 and 56105bp. These two contigs contain a total of 163 open reading frames (ORFs) in 26–29 putative operons; 56 ORFs could be identified with reasonable certainty. Clusters of ORFs potentially encode proteins of glycogen biosynthesis, oxidative decarboxylation of pyruvate, ATP‐dependent transport across membranes, isoprenoid biosynthesis, protein synthesis, and ribosomes. Putative promoters occur upstream of most ORFs. Thirty per cent of the predicted strong and medium‐strength promoters can initiate transcription at the start codon or within 10 nucleotides upstream, indicating a process of initial mRNA‐ribosome contact unlike that of most eubacterial genes. A novel termination motif is proposed to account for 15 additional terminations. The two contigs differ in densities of ORFs, insertion elements and repeated sequences; together they contain two copies of the previously reported insertion sequence ISC 1217, five additional IS elements representing four novel types, four classes of long non‐IS repeated sequences, and numerous short, perfect repeats.