Fast volume rendering using an efficient, scalable parallel formulation of the shear-warp algorithm

Abstract
This paper presents a fast and scalable parallel algo- rithm for volume rendering and its implementation on distributed-memory parallel computers. This paral- lel algorithm is based on the shear-warp algorithm of Lacroute and Levoy. Coupled with optimizations that exploit coherence in the volume and image space, the shear-warp algorithm is currently acknowledged to be the fastest sequential volume rendering algorithm. We have designed a memory efficient parallel formula- tion of this algorithm that (1) drastically reduces com- munication requirements by using a novel data par- titioning scheme and (2) improves multi-frame per- formance with an adaptive load-balancing technique. All the optimizations of the Lacroute-Levoy algorithm are preserved in the parallel formulation. The paper also provides an analytical model of performance for the parallel formulation that shows that it is possible to sustain excellent performance across a wide range of practical problem sizes and number of processors. Our implementation, running on a 128 processor TMC CM-5 distributed-memory parallel computer, renders a 2563 voxel medical data set at 12 frames/sec.