Memory bandwidth limitations of future microprocessors

1 May 1996

proceedings article
Published by Association for Computing Machinery (ACM)

Vol. 24 (2), 78-89
https://doi.org/10.1145/232973.232983

Abstract

This paper makes the case that pin bandwidth will be a critical consideration for future microprocessors. We show that many of the techniques used to tolerate growing memory latencies do so at the expense of increased bandwidth requirements. Using a decomposition of execution time, we show that for modern processors that employ aggressive memory latency tolerance techniques, wasted cycles due to insufficient bandwidth generally exceed those due to raw memory latencies. Given the importance of maximizing memory bandwidth, we calculate then estimate optimal effective pin bandwidth. We measure these quantities by determining the amount by which both caches and minimal-traffic caches filter accesses to the lower levels of the memory hierarchy. We see that there is a gap that can exceed two orders of magnitude between the total memory traffic generated by caches and the minimal-traffic caches---implying that the potential exists to increase effective pin bandwidth substantially. We decompose this traffic gap into four factors, and show they contribute quite differently to traffic reduction for different benchmarks. We conclude that, in the short term, pin bandwidth limitations will make more complex on-chip caches cost-effective. For example, flexible caches may allow individual applications to choose from a range of caching policies. In the long term, we predict that off-chip accesses will be so expensive that all system memory will reside on one or more processor chips.

Keywords

This publication has 26 references indexed in Scilit:

Multiscalar processors
Published by Association for Computing Machinery (ACM) ,1995
Interleaving
Published by Association for Computing Machinery (ACM) ,1994
Wisconsin Architectural Research Tool Set
ACM SIGARCH Computer Architecture News, 1993
Software support for speculative loads
Published by Association for Computing Machinery (ACM) ,1992
The cache performance and optimizations of blocked algorithms
Published by Association for Computing Machinery (ACM) ,1991
Instruction issue logic for high-performance, interruptible, multiple functional unit, pipelined computers
IEEE Transactions on Computers, 1990
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
Published by Association for Computing Machinery (ACM) ,1990
Experimental evaluation of on-chip microprocessor cache memories
Published by Association for Computing Machinery (ACM) ,1984
Cache Memories
ACM Computing Surveys, 1982
Index Register Allocation
Journal of the ACM, 1966

Cited by 216 articles