Abstract
The analysis of a human thyroid serial analysis of gene expression (SAGE) library shows the presence of an abundant SAGE tag corresponding to the mRNA of thyroglobulin (TG). Additional, less abundant tags are present that can not be linked to any other known gene, but show considerable homology to the wild-type TG tag. To determine whether these tags represent TG mRNA molecules with alternative cleavage, 3'-RACE clones were sequenced. The results show that the three putative TG SAGE tags can be attributed to TG transcripts and reflect the use of alternative polyadenylation cleavage sites downstream of a single polyadenylation signal in vivo. By screening more than 300 000 sequences corresponding to human, mouse and rat transcripts for this phenomenon we show that a considerable percentage of mRNA transcripts (44% human, 22% mouse and 22% rat) show cleavage site heterogeneity. When analyzing SAGE-generated expression data, this phenomenon should be considered, since, according to our calculations, 2.8% of human transcripts show two or more different SAGE tags corresponding to a single gene because of alternative cleavage site selection. Both experimental and in silico data show that the selection of the specific cleavage site for poly(A) addition using a given polyadenylation signal is more variable than was previously thought.