A system for pattern matching applications on biosequences

Abstract
ANREP is a system for finding matches to patterns composed of (i) spacing constraints called ‘spacers’, and (ii) approximate matches to ‘motifs’ that are, recursively, patterns composed of ‘atomic’ symbols. A user specifies such patterns via a declarative, free-format and strongly typed language called A that is presented here in a tutorial style through a series of progressively more complex examples. The sample patterns are for protein and DNA sequences, the application domain for which ANREP wos specifically created. ANREP provides a unified framework for almost all previously proposed biosequence patterns and extends them by providing approximate matching, a feature heretofore unavailable except for the limited case of individual sequences. The pemformance of ANREP is discussed and an appendix gives α concise specification of syntax and semantics. A portable C softwore package implementing ANREP is available via anonymous remote file transfer.