Memory bandwidth limitations of future microprocessors
- 1 May 1996
- proceedings article
- Published by Association for Computing Machinery (ACM)
- Vol. 24 (2), 78-89
- https://doi.org/10.1145/232973.232983
Abstract
This paper makes the case that pin bandwidth will be a critical consideration for future microprocessors. We show that many of the techniques used to tolerate growing memory latencies do so at the expense of increased bandwidth requirements. Using a decomposition of execution time, we show that for modern processors that employ aggressive memory latency tolerance techniques, wasted cycles due to insufficient bandwidth generally exceed those due to raw memory latencies. Given the importance of maximizing memory bandwidth, we calculate then estimate optimal effective pin bandwidth. We measure these quantities by determining the amount by which both caches and minimal-traffic caches filter accesses to the lower levels of the memory hierarchy. We see that there is a gap that can exceed two orders of magnitude between the total memory traffic generated by caches and the minimal-traffic caches---implying that the potential exists to increase effective pin bandwidth substantially. We decompose this traffic gap into four factors, and show they contribute quite differently to traffic reduction for different benchmarks. We conclude that, in the short term, pin bandwidth limitations will make more complex on-chip caches cost-effective. For example, flexible caches may allow individual applications to choose from a range of caching policies. In the long term, we predict that off-chip accesses will be so expensive that all system memory will reside on one or more processor chips.Keywords
This publication has 26 references indexed in Scilit:
- Multiscalar processorsPublished by Association for Computing Machinery (ACM) ,1995
- InterleavingPublished by Association for Computing Machinery (ACM) ,1994
- Wisconsin Architectural Research Tool SetACM SIGARCH Computer Architecture News, 1993
- Software support for speculative loadsPublished by Association for Computing Machinery (ACM) ,1992
- The cache performance and optimizations of blocked algorithmsPublished by Association for Computing Machinery (ACM) ,1991
- Instruction issue logic for high-performance, interruptible, multiple functional unit, pipelined computersIEEE Transactions on Computers, 1990
- Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffersPublished by Association for Computing Machinery (ACM) ,1990
- Experimental evaluation of on-chip microprocessor cache memoriesPublished by Association for Computing Machinery (ACM) ,1984
- Cache MemoriesACM Computing Surveys, 1982
- Index Register AllocationJournal of the ACM, 1966