Web‐based data warehouse on gene expression in human colorectal cancer

Abstract
Based on biomedical literature databases, we tried a first step for constructing a gene expression “data warehouse” specific to human colorectal cancer (CRC). Results of genome-wide transcriptomic research were available from 12 studies, using various technologies, namely, SAGE, cDNA and oligonucleotide arrays, and adaptor-tagged amplification. Three studies analyzed CRC cell lines and nine studies of human samples. The total number of patients was 144. Out of 982 up- or down-regulated genes, 863 (88%) were found to be differentially expressed in a single study, 88 in two studies, 22 in three studies, 7 in four studies, and only 2 genes in six studies. Eight large-scale proteomics studies were published in CRC, using 2-D-, SDS- or free-flow electrophoresis, involving only 11 patients. Out of 408 differentially expressed proteins, 339 (83%) were found to be differentially expressed only in a single study, 16 in three studies, 10 in four studies, 3 in five, and 1 in eight studies. Confirmation at proteome level of results obtained with large-scale transcriptomics studies was possible in 25%. This proportion was higher (67%) for reproducing proteome results using transcriptomics technologies. Obviously, reproducibility and overlapping between published gene expression results at proteome and transcriptome level are low in human CRC. Thus, the development of standardized processes for collecting samples, storing, retrieving, and querying gene expression data obtained with different technologies is of central importance in translational research.