利用内存级并行性的微架构优化

Proceedings. 31st Annual International Symposium on Computer Architecture, 2004. Pub Date : 2004-06-19 DOI:10.1145/1028176.1006708

Yuan Chou, Brian Fahs, S. Abraham

{"title":"利用内存级并行性的微架构优化","authors":"Yuan Chou, Brian Fahs, S. Abraham","doi":"10.1145/1028176.1006708","DOIUrl":null,"url":null,"abstract":"The performance of memory-bound commercial applications such as databases is limited by increasing memory latencies. In this paper, we show that exploiting memory-level parallelism (MLP) is an effective approach for improving the performance of these applications and that microarchitecture has a profound impact on achievable MLP. Using the epoch model of MLP, we reason how traditional microarchitecture features such as out-of-order issue and state-of-the-art microarchitecture techniques such as runahead execution affect MLP. Simulation results show that a moderately aggressive out-of-order issue processor improves MLP over an in-order issue processor by 12-30%, and that aggressive handling of loads, branches and serializing instructions is needed to attain the full benefits of large out-of-order instruction windows. The results also show that a processor's issue window and reorder buffer should be decoupled to exploit MLP more efficiently. In addition, we demonstrate that runahead execution is highly effective in enhancing MLP, potentially improving the MLP of the database workload by 82% and its overall performance by 60%. Finally, our limit study shows that there is considerable headroom in improving MLP and overall performance by implementing effective instruction prefetching, more accurate branch prediction and better value prediction in addition to runahead execution.","PeriodicalId":268352,"journal":{"name":"Proceedings. 31st Annual International Symposium on Computer Architecture, 2004.","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"235","resultStr":"{\"title\":\"Microarchitecture optimizations for exploiting memory-level parallelism\",\"authors\":\"Yuan Chou, Brian Fahs, S. Abraham\",\"doi\":\"10.1145/1028176.1006708\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The performance of memory-bound commercial applications such as databases is limited by increasing memory latencies. In this paper, we show that exploiting memory-level parallelism (MLP) is an effective approach for improving the performance of these applications and that microarchitecture has a profound impact on achievable MLP. Using the epoch model of MLP, we reason how traditional microarchitecture features such as out-of-order issue and state-of-the-art microarchitecture techniques such as runahead execution affect MLP. Simulation results show that a moderately aggressive out-of-order issue processor improves MLP over an in-order issue processor by 12-30%, and that aggressive handling of loads, branches and serializing instructions is needed to attain the full benefits of large out-of-order instruction windows. The results also show that a processor's issue window and reorder buffer should be decoupled to exploit MLP more efficiently. In addition, we demonstrate that runahead execution is highly effective in enhancing MLP, potentially improving the MLP of the database workload by 82% and its overall performance by 60%. Finally, our limit study shows that there is considerable headroom in improving MLP and overall performance by implementing effective instruction prefetching, more accurate branch prediction and better value prediction in addition to runahead execution.\",\"PeriodicalId\":268352,\"journal\":{\"name\":\"Proceedings. 31st Annual International Symposium on Computer Architecture, 2004.\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"235\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. 31st Annual International Symposium on Computer Architecture, 2004.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1028176.1006708\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 31st Annual International Symposium on Computer Architecture, 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1028176.1006708","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 235

摘要

内存受限的商业应用程序(如数据库)的性能受到内存延迟增加的限制。在本文中，我们证明了利用内存级并行性(MLP)是提高这些应用程序性能的有效方法，并且微架构对实现的MLP具有深远的影响。使用MLP的epoch模型，我们推断传统的微架构特性(如乱序问题)和最先进的微架构技术(如提前执行)如何影响MLP。仿真结果表明，适度激进的乱序问题处理器比有序问题处理器提高了12-30%的MLP，并且需要积极处理负载、分支和序列化指令以获得大型乱序指令窗口的全部好处。结果还表明，处理器的问题窗口和重排序缓冲区应该解耦，以更有效地利用MLP。此外，我们证明了提前执行在增强MLP方面非常有效，可能将数据库工作负载的MLP提高82%，并将其总体性能提高60%。最后，我们的极限研究表明，除了运行前执行之外，通过实现有效的指令预取、更准确的分支预测和更好的值预测，在提高MLP和整体性能方面还有相当大的空间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Microarchitecture optimizations for exploiting memory-level parallelism

The performance of memory-bound commercial applications such as databases is limited by increasing memory latencies. In this paper, we show that exploiting memory-level parallelism (MLP) is an effective approach for improving the performance of these applications and that microarchitecture has a profound impact on achievable MLP. Using the epoch model of MLP, we reason how traditional microarchitecture features such as out-of-order issue and state-of-the-art microarchitecture techniques such as runahead execution affect MLP. Simulation results show that a moderately aggressive out-of-order issue processor improves MLP over an in-order issue processor by 12-30%, and that aggressive handling of loads, branches and serializing instructions is needed to attain the full benefits of large out-of-order instruction windows. The results also show that a processor's issue window and reorder buffer should be decoupled to exploit MLP more efficiently. In addition, we demonstrate that runahead execution is highly effective in enhancing MLP, potentially improving the MLP of the database workload by 82% and its overall performance by 60%. Finally, our limit study shows that there is considerable headroom in improving MLP and overall performance by implementing effective instruction prefetching, more accurate branch prediction and better value prediction in addition to runahead execution.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings. 31st Annual International Symposium on Computer Architecture, 2004.

自引率

0.00%

发文量

期刊最新文献

A content aware integer register file organization The vector-thread architecture From sequences of dependent instructions to functions: an approach for improving performance without ILP or speculation Evaluating the Imagine stream architecture Wire delay is not a problem for SMT (in the near future)