Microarchitecture Optimizations for Exploiting Memory-Level Parallelism

Abstract

The performance of memory-bound commercial applicationssuch as databases is limited by increasing memory latencies. Inthis paper, we show that exploiting memory-level parallelism(MLP) is an effective approach for improving the performance ofthese applications and that microarchitecture has a profound impacton achievable MLP. Using the epoch model of MLP, we reasonhow traditional microarchitecture features such as out-of-orderissue and state-of-the-art microarchitecture techniques suchas runahead execution affect MLP. Simulation results show that amoderately aggressive out-of-order issue processor improvesMLP over an in-order issue processor by 12-30%, and that aggressivehandling of loads, branches and serializing instructionsis needed to attain the full benefits of large out-of-order instructionwindows. The results also show that a processor's issue windowand reorder buffer should be decoupled to exploit MLP more efficiently.In addition, we demonstrate that runahead execution ishighly effective in enhancing MLP, potentially improving the MLPof the database workload by 82% and its overall performance by60%. Finally, our limit study shows that there is considerableheadroom in improving MLP and overall performance by implementingeffective instruction prefetching, more accurate branchprediction and better value prediction in addition to runahead execution.

Keywords

This publication has 10 references indexed in Scilit:

Enhancing memory level parallelism via recovery-free value prediction
Published by Association for Computing Machinery (ACM) ,2003
Design and evaluation of compiler algorithms for pre-execution
Published by Association for Computing Machinery (ACM) ,2002
Slice-processors
Published by Association for Computing Machinery (ACM) ,2001
Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors
Published by Association for Computing Machinery (ACM) ,2001
Speculative precomputation
Published by Association for Computing Machinery (ACM) ,2001
Performance of database workloads on shared-memory systems with out-of-order processors
Published by Association for Computing Machinery (ACM) ,1998
Improving data cache performance by pre-executing instructions under a cache miss
Published by Association for Computing Machinery (ACM) ,1997
Value locality and load value prediction
Published by Association for Computing Machinery (ACM) ,1996
Hitting the memory wall
ACM SIGARCH Computer Architecture News, 1995
Contrasting characteristics and cache performance of technical and multi-user commercial workloads
Published by Association for Computing Machinery (ACM) ,1994

Cited by 78 articles