Structure of the β‐glucosidase gene bglA of Clostridium thermocellum

Abstract
The nucleotide sequence of the Clostridium thermocellum gene bglA, coding for the thermostable β-glucosidase A, has been determined. The coding region of 1344 bp was identified by comparison with the N-terminal amino acid squence of recombinant β-glucosidase A purified from Escherichia coli. The deduced amino acid sequence corresponds to a protein of 51482 Da. The coding region is flanked by putative promoter and transcription terminator sequences. The protein is unrelated to β-glucosidase B of C. thermocellum, but has a high level of similarity with other bacterial β-glucosidases and phospho-β-glucosidases. Similarity is also observed with the β-galactosidase of the archaebacterium Sulfolobus solfataricus. Unexpectedly, it was found that human lactasephlorizin hydrolase contains three copies of a sequence closely related to C. thermocellumβ-glucosidase A (up to 40% sequence identity). These diverse β-glucosidases can therefore be grouped into an enzyme family (BGA) of common structural design. Sequence comparison by hydrophobic cluster analysis revealed that all BGA enzymes share a well conserved region which is homologous to the catalytic domain of the widely distributed cellulase family A. A distinctive feature of this domain is the sequence motif His – Ans-Glu-Pro in which the catalytic residues His and Glu are separated by 35–55 amino acid residues. The cellulase family A and the β-glucosidase family BGA might thus be considered as members of a protein super-family comprising β-glucanases and β-glycosidases from all three primary kingdoms of living organisms.