Factoring: a practical and robust method for scheduling parallel loops

Abstract
Loops are the prevalent source of parallelism in scientific code. To effectively schedule the iterations of such loops on multiple proces- sors, uneven processor finishing times must be optimally traded off with scheduling over- head. Scheduling large chunks of iterations reduces overhead, but increases uneveness, whereas scheduling small chunks has the opposite effect. In this paper we present a new scheduling strategy, called factoring, wherein independent iterations are scheduled in decreasing-sized chunks; the later smaller chunks are used to smooth over the uneveness of the earlier larger ones. The sizes of the chunks are chosen so that there is a high probabdity that they will be completed before the optimal finishhig time, and hence, that there is enough work (iterations) remaining to smooth over their uneveness. Factoring has been implemented as part of the RP3 runtime system for the PTRAN restructuring compiler. Experimental results comparing factoring with three well-known scheduling strategies, static chunking, self-scheduling, and guided self-scheduling, are presented. Three benchmarks with a wide variety of parallel-loop characteristics were run on 4 to 56 processors. In nearly every case, factoring outperforms the other schemes. Thus we con- clude that factoring is an extremely robust, scalable, and practical scheduling strategy for