Effective hardware-based data prefetching for high-performance processors

1 May 1995

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Computers

Vol. 44 (5), 609-623
https://doi.org/10.1109/12.381947

Abstract

Memory latency and bandwidth are progressing at a much slower pace than processor performance. In this paper, we describe and evaluate the performance of three variations of a hardware function unit whose goal is to assist a data cache in prefetching data accesses so that memory latency is hidden as often as possible. The basic idea of the prefetching scheme is to keep track of data access patterns in a reference prediction table (RPT) organized as an instruction cache. The three designs differ mostly on the timing of the prefetching. In the simplest scheme (basic), prefetches can be generated one iteration ahead of actual use. The lookahead variation takes advantage of a lookahead program counter that ideally stays one memory latency time ahead of the real program counter and that is used as the control mechanism to generate the prefetches. Finally the correlated scheme uses a more sophisticated design to detect patterns across loop levels. These designs are evaluated by simulating the ten SPEC benchmarks on a cycle-by-cycle basis. The results show that 1) the three hardware prefetching schemes all yield significant reductions in the data access penalty when compared with regular caches, 2) the benefits are greater when the hardware assist augments small on-chip caches, and 3) the lookahead scheme is the preferred one cost-performance wise.

Keywords

This publication has 20 references indexed in Scilit:

Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Design and evaluation of a compiler algorithm for prefetching
Published by Association for Computing Machinery (ACM) ,1992
Reducing memory latency via non-blocking and prefetching caches
Published by Association for Computing Machinery (ACM) ,1992
Tolerating latency through software-controlled prefetching in shared-memory multiprocessors
Journal of Parallel and Distributed Computing, 1991
An effective on-chip preloading scheme to reduce data access penalty
Published by Association for Computing Machinery (ACM) ,1991
Data access microarchitectures for superscalar processors with compiler-assisted data prefetching
Published by Association for Computing Machinery (ACM) ,1991
Compiler-directed data prefetching in multiprocessors with memory hierarchies
Published by Association for Computing Machinery (ACM) ,1990
Multilevel cache hierarchies: Organizations, protocols, and performance
Journal of Parallel and Distributed Computing, 1989
Multiprocessor cache design considerations
Published by Association for Computing Machinery (ACM) ,1987
Branch Prediction Strategies and Branch Target Buffer Design
Computer, 1984

Cited by 332 articles