Abstract
The recognition of standard computational structures (cliches) in a program can help an experienced programmer understand the program. Based on the known relationships between the cliches, a hierarchical description of the program's design can be recovered. We develop and study a graph parsing approach to automating program recognition in which programs are represented as attributed dataflow graphs and a library of cliches is encoded as an attributed graph grammar. Graph parsing is used to recognize cliche's in the code. We demonstrate that this graph parsing approach is a feasible and useful way to automate program recognition. In studying this approach, we have experimented with two medium-sized, real-world simulator programs. There are three aspects of our study. First, we evaluate our representation's ability to suppress many common forms of program variation which hinder recognition. Second, we investigate the expressiveness of our graph grammar formalism for capturing programming cliches. Third, we empirically and analytically study the computational cost of our recognition approach with respect to the real-world simulator programs.