A Large Scale Analysis of cDNA in Arabidopsis thaliana: Generation of 12,028 Non-redundant Expressed Sequence Tags from Normalized and Size-selected cDNA Libraries

Abstract
For comprehensive analysis of genes expressed in the model dicotyledonous plant, Arabidopsis thaliana , expressed sequence tags (ESTs) were accumulated. Normalized and size-selected cDNA libraries were constructed from aboveground organs, flower buds, roots, green siliques and liquid-cultured seedlings, respectively, and a total of 14,026 5′-end ESTs and 39,207 3′-end ESTs were obtained. The 3′-end ESTs could be clustered into 12,028 non-redundant groups. Similarity search of the non-redundant ESTs against the public non-redundant protein database indicated that 4816 groups show similarity to genes of known function, 1864 to hypotheticalgenes, and the remaining 5348 are novel sequences. Gene coverage by the non-redundant ESTs was analyzed using the annotated genomic sequences of approximately 10 Mb on chromosomes 3 and 5. A total of 923 regions were hit by at least one EST, among which only 499 regions were hit by the ESTs deposited in the public database. The result indicates that the EST source generated in this project complements the EST data in the public database and facilitates new gene discovery. The EST sequence data of individual cDNA clones are available at the web site: .