Reconstruction of Amino Acid Biosynthesis Pathways from the Complete Genome Sequence

Abstract
The complete genome sequence of an organism contains information that has not been fully utilized in the current prediction methods of gene functions, which are based on piece-by-piece similarity searches of individual genes. We present here a method that utilizes a higher level information of molecular pathways to reconstruct a complete functional unit from a set of genes. Specifically, a genome-by-genome comparison is first made for identifying enzyme genes and assigning EC numbers, which is followed by the reconstruction of selected portions of the metabolic pathways by use of the reference biochemical knowledge. The completeness of the reconstructed pathway is an indicator of the correctness of the initial gene function assignment. This feature has become possible because of our efforts to computerize the current knowledge of metabolic pathways under the KEGG project. We found that the biosynthesis pathways of all 20 amino acids were completely reconstructed in Escherichia coli, Haemophilus influenzae, and Bacillus subtilis, and probably inSynechocystis and Saccharomyces cerevisiae as well, although it was necessary to assume wider substrate specificity for aspartate aminotransferases.