改进多核多线程面向对象应用程序的共享缓存行为

2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) Pub Date : 2011-11-07 DOI:10.1109/ICCAD.2011.6105315

M. Kandemir, Shekhar Srikantaiah, S. Son

{"title":"改进多核多线程面向对象应用程序的共享缓存行为","authors":"M. Kandemir, Shekhar Srikantaiah, S. Son","doi":"10.1109/ICCAD.2011.6105315","DOIUrl":null,"url":null,"abstract":"Understanding shared cache performance when executing multithreaded object-oriented applications and optimizing these applications for multicores have not received much attention. In this paper, we first quantify the intra-thread and inter-thread cache line (block) reuse characteristics of a set of multithreaded C++ programs when executed in shared cache based multicores. Our results show that, as far as shared on-chip caches are concerned, inter-thread cache line (block) reuse distances are much higher than intra-thread cache line reuse distances. We study the impact of these characteristics on the hit/miss behavior of the shared last-level cache on a commercial multicore machine. We then show that, by rearranging accesses to the objects shared across different threads and to the objects stored in nearby memory locations, inter-thread (temporal and spatial) object reuse distances can be reduced, which in turn helps to reduce inter-thread cache line reuse distances. The results we collected using eight multithreaded applications show that our proposed shared cache-aware code restructuring strategy can reduce misses in the last-level on-chip cache of a commercial multicore machine by 25.4%, on average. These savings in cache misses translate in turn to average execution time improvement of 11.9%.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"1 1","pages":"118-125"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Improving shared cache behavior of multithreaded object-oriented applications in multicores\",\"authors\":\"M. Kandemir, Shekhar Srikantaiah, S. Son\",\"doi\":\"10.1109/ICCAD.2011.6105315\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Understanding shared cache performance when executing multithreaded object-oriented applications and optimizing these applications for multicores have not received much attention. In this paper, we first quantify the intra-thread and inter-thread cache line (block) reuse characteristics of a set of multithreaded C++ programs when executed in shared cache based multicores. Our results show that, as far as shared on-chip caches are concerned, inter-thread cache line (block) reuse distances are much higher than intra-thread cache line reuse distances. We study the impact of these characteristics on the hit/miss behavior of the shared last-level cache on a commercial multicore machine. We then show that, by rearranging accesses to the objects shared across different threads and to the objects stored in nearby memory locations, inter-thread (temporal and spatial) object reuse distances can be reduced, which in turn helps to reduce inter-thread cache line reuse distances. The results we collected using eight multithreaded applications show that our proposed shared cache-aware code restructuring strategy can reduce misses in the last-level on-chip cache of a commercial multicore machine by 25.4%, on average. These savings in cache misses translate in turn to average execution time improvement of 11.9%.\",\"PeriodicalId\":6357,\"journal\":{\"name\":\"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)\",\"volume\":\"1 1\",\"pages\":\"118-125\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCAD.2011.6105315\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCAD.2011.6105315","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在执行多线程面向对象应用程序和为多核优化这些应用程序时，理解共享缓存性能并没有得到太多关注。在本文中，我们首先量化了一组多线程c++程序在基于共享缓存的多核中执行时的线程内和线程间缓存线(块)重用特征。我们的研究结果表明，就共享片上缓存而言，线程间缓存线(块)重用距离远高于线程内缓存线重用距离。我们研究了这些特征对商业多核机器上共享最后一级缓存的命中/未命中行为的影响。然后我们表明，通过重新安排对不同线程之间共享的对象和存储在附近内存位置的对象的访问，可以减少线程间(时间和空间)对象重用距离，这反过来有助于减少线程间缓存线重用距离。我们使用八个多线程应用程序收集的结果表明，我们提出的共享缓存感知代码重构策略可以将商用多核机器的最后一级片上缓存中的失误平均减少25.4%。这些缓存丢失的节省转化为平均执行时间的11.9%的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Improving shared cache behavior of multithreaded object-oriented applications in multicores

Understanding shared cache performance when executing multithreaded object-oriented applications and optimizing these applications for multicores have not received much attention. In this paper, we first quantify the intra-thread and inter-thread cache line (block) reuse characteristics of a set of multithreaded C++ programs when executed in shared cache based multicores. Our results show that, as far as shared on-chip caches are concerned, inter-thread cache line (block) reuse distances are much higher than intra-thread cache line reuse distances. We study the impact of these characteristics on the hit/miss behavior of the shared last-level cache on a commercial multicore machine. We then show that, by rearranging accesses to the objects shared across different threads and to the objects stored in nearby memory locations, inter-thread (temporal and spatial) object reuse distances can be reduced, which in turn helps to reduce inter-thread cache line reuse distances. The results we collected using eight multithreaded applications show that our proposed shared cache-aware code restructuring strategy can reduce misses in the last-level on-chip cache of a commercial multicore machine by 25.4%, on average. These savings in cache misses translate in turn to average execution time improvement of 11.9%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

自引率

0.00%

发文量