Effective hardware-based data prefetching for high-performance processors
- 1 May 1995
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Computers
- Vol. 44 (5), 609-623
- https://doi.org/10.1109/12.381947
Abstract
Memory latency and bandwidth are progressing at a much slower pace than processor performance. In this paper, we describe and evaluate the performance of three variations of a hardware function unit whose goal is to assist a data cache in prefetching data accesses so that memory latency is hidden as often as possible. The basic idea of the prefetching scheme is to keep track of data access patterns in a reference prediction table (RPT) organized as an instruction cache. The three designs differ mostly on the timing of the prefetching. In the simplest scheme (basic), prefetches can be generated one iteration ahead of actual use. The lookahead variation takes advantage of a lookahead program counter that ideally stays one memory latency time ahead of the real program counter and that is used as the control mechanism to generate the prefetches. Finally the correlated scheme uses a more sophisticated design to detect patterns across loop levels. These designs are evaluated by simulating the ten SPEC benchmarks on a cycle-by-cycle basis. The results show that 1) the three hardware prefetching schemes all yield significant reductions in the data access penalty when compared with regular caches, 2) the benefits are greater when the hardware assist augments small on-chip caches, and 3) the lookahead scheme is the preferred one cost-performance wise.Keywords
This publication has 20 references indexed in Scilit:
- Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffersPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Design and evaluation of a compiler algorithm for prefetchingPublished by Association for Computing Machinery (ACM) ,1992
- Reducing memory latency via non-blocking and prefetching cachesPublished by Association for Computing Machinery (ACM) ,1992
- Tolerating latency through software-controlled prefetching in shared-memory multiprocessorsJournal of Parallel and Distributed Computing, 1991
- An effective on-chip preloading scheme to reduce data access penaltyPublished by Association for Computing Machinery (ACM) ,1991
- Data access microarchitectures for superscalar processors with compiler-assisted data prefetchingPublished by Association for Computing Machinery (ACM) ,1991
- Compiler-directed data prefetching in multiprocessors with memory hierarchiesPublished by Association for Computing Machinery (ACM) ,1990
- Multilevel cache hierarchies: Organizations, protocols, and performanceJournal of Parallel and Distributed Computing, 1989
- Multiprocessor cache design considerationsPublished by Association for Computing Machinery (ACM) ,1987
- Branch Prediction Strategies and Branch Target Buffer DesignComputer, 1984