Search algorithm for pattern match analysis of nuleic add sequences

Abstract
A new type of search algorithm to find biological information inherited 1n nucleic acid sequences was developed. The algorithm 1s of pattern natch type and is based on the fact that genetic Information often 1s a function of a predictable statistical occurence of the four bases within parts of the sequence. The search algorithm compares the known statistical pattern of bases in e.g. a promoter, with an unknown sequence and calculates the statistical significande of the match at all positions in the unknown sequence. The program was tested on 54 published prokaryotic promoters. 44 or 49 could be found with 1 or 4 false answers, respectively. The program was also used on plasmid pBR322. All promoters functioning 1n an 1n vitro transcription system were found (tet, anti-tet, p4, bla and ori) except the so called p5 promoter. A search for donor and acceptor sites was performed 1n a human HLA genomic sequence that contains six introns. Five of the possible six donor and acceptor sites were found.