Microarchitecture Optimizations for Exploiting Memory-Level Parallelism
- 2 March 2004
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGARCH Computer Architecture News
- Vol. 32 (2), 76
- https://doi.org/10.1145/1028176.1006708
Abstract
The performance of memory-bound commercial applicationssuch as databases is limited by increasing memory latencies. Inthis paper, we show that exploiting memory-level parallelism(MLP) is an effective approach for improving the performance ofthese applications and that microarchitecture has a profound impacton achievable MLP. Using the epoch model of MLP, we reasonhow traditional microarchitecture features such as out-of-orderissue and state-of-the-art microarchitecture techniques suchas runahead execution affect MLP. Simulation results show that amoderately aggressive out-of-order issue processor improvesMLP over an in-order issue processor by 12-30%, and that aggressivehandling of loads, branches and serializing instructionsis needed to attain the full benefits of large out-of-order instructionwindows. The results also show that a processor's issue windowand reorder buffer should be decoupled to exploit MLP more efficiently.In addition, we demonstrate that runahead execution ishighly effective in enhancing MLP, potentially improving the MLPof the database workload by 82% and its overall performance by60%. Finally, our limit study shows that there is considerableheadroom in improving MLP and overall performance by implementingeffective instruction prefetching, more accurate branchprediction and better value prediction in addition to runahead execution.Keywords
This publication has 10 references indexed in Scilit:
- Enhancing memory level parallelism via recovery-free value predictionPublished by Association for Computing Machinery (ACM) ,2003
- Design and evaluation of compiler algorithms for pre-executionPublished by Association for Computing Machinery (ACM) ,2002
- Slice-processorsPublished by Association for Computing Machinery (ACM) ,2001
- Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processorsPublished by Association for Computing Machinery (ACM) ,2001
- Speculative precomputationPublished by Association for Computing Machinery (ACM) ,2001
- Performance of database workloads on shared-memory systems with out-of-order processorsPublished by Association for Computing Machinery (ACM) ,1998
- Improving data cache performance by pre-executing instructions under a cache missPublished by Association for Computing Machinery (ACM) ,1997
- Value locality and load value predictionPublished by Association for Computing Machinery (ACM) ,1996
- Hitting the memory wallACM SIGARCH Computer Architecture News, 1995
- Contrasting characteristics and cache performance of technical and multi-user commercial workloadsPublished by Association for Computing Machinery (ACM) ,1994