~~ The Clostridium thermocellum ceN gene, coding for endoglucanase I (CelI), consists of an open reading frame (OW) of 2640 nucleotides and codes for a protein of M, 98531. The ORF was confirmed as ceN by comparing the N-terminal sequence of purified recombinant CelI with that deduced from the nucleotide sequence. CelI hydrolysed lichenan and carboxymethylcellulose, but was principally active against barley Fglucan. It exhibited significant sequence identity with subfamily E, endoglucanases, and by analogy with others in this group contains a catalytic domain of around 500 residues located in the N-terminal half of the protein. The C-terminal region of CelI was highly homologous with the cellulose-binding domain of the non-catalytic cellulosome subunit, S1. A repeated segment, previously shown to be highly conserved in xylanase Z and in other endoglucanases from C. thermocellum, was absent from CelI. Antiserum raised against purified recombinant CelI cross-reacted with proteins contained in the cellulosomes of two strains of C. thermocellum, suggesting that CelI is either a component of the cellulosome or is homologous to other cellulosome proteins. A second gene, located upstream of cell, consisted of an ORF of 1671 nucleotides, coding for a protein of M,. 61042. Based on its homology with the Escherichia coli tar gene product, the polypeptide encoded by the second gene is tentatively identified as a sensory transducer.