局部性存在于图处理:Ivy Bridge服务器的工作负载表征

2015 IEEE International Symposium on Workload Characterization Pub Date : 2015-10-04 DOI:10.1109/IISWC.2015.12

S. Beamer, K. Asanović, D. Patterson

{"title":"局部性存在于图处理:Ivy Bridge服务器的工作负载表征","authors":"S. Beamer, K. Asanović, D. Patterson","doi":"10.1109/IISWC.2015.12","DOIUrl":null,"url":null,"abstract":"Graph processing is an increasingly important application domain and is typically communication-bound. In this work, we analyze the performance characteristics of three high-performance graph algorithm codebases using hardware performance counters on a conventional dual-socket server. Unlike many other communication-bound workloads, graph algorithms struggle to fully utilize the platform's memory bandwidth and so increasing memory bandwidth utilization could be just as effective as decreasing communication. Based on our observations of simultaneous low compute and bandwidth utilization, we find there is substantial room for a different processor architecture to improve performance without requiring a new memory system.","PeriodicalId":142698,"journal":{"name":"2015 IEEE International Symposium on Workload Characterization","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"148","resultStr":"{\"title\":\"Locality Exists in Graph Processing: Workload Characterization on an Ivy Bridge Server\",\"authors\":\"S. Beamer, K. Asanović, D. Patterson\",\"doi\":\"10.1109/IISWC.2015.12\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graph processing is an increasingly important application domain and is typically communication-bound. In this work, we analyze the performance characteristics of three high-performance graph algorithm codebases using hardware performance counters on a conventional dual-socket server. Unlike many other communication-bound workloads, graph algorithms struggle to fully utilize the platform's memory bandwidth and so increasing memory bandwidth utilization could be just as effective as decreasing communication. Based on our observations of simultaneous low compute and bandwidth utilization, we find there is substantial room for a different processor architecture to improve performance without requiring a new memory system.\",\"PeriodicalId\":142698,\"journal\":{\"name\":\"2015 IEEE International Symposium on Workload Characterization\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"148\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Symposium on Workload Characterization\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IISWC.2015.12\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Symposium on Workload Characterization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC.2015.12","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 148

摘要

图形处理是一个日益重要的应用领域，通常与通信有关。在这项工作中，我们在传统的双插座服务器上使用硬件性能计数器分析了三个高性能图形算法代码库的性能特征。与许多其他通信绑定的工作负载不同，图形算法很难充分利用平台的内存带宽，因此增加内存带宽利用率可能与减少通信一样有效。根据我们对同时较低的计算和带宽利用率的观察，我们发现在不需要新的内存系统的情况下，不同的处理器架构有很大的空间来提高性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Locality Exists in Graph Processing: Workload Characterization on an Ivy Bridge Server

Graph processing is an increasingly important application domain and is typically communication-bound. In this work, we analyze the performance characteristics of three high-performance graph algorithm codebases using hardware performance counters on a conventional dual-socket server. Unlike many other communication-bound workloads, graph algorithms struggle to fully utilize the platform's memory bandwidth and so increasing memory bandwidth utilization could be just as effective as decreasing communication. Based on our observations of simultaneous low compute and bandwidth utilization, we find there is substantial room for a different processor architecture to improve performance without requiring a new memory system.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 IEEE International Symposium on Workload Characterization

自引率

0.00%

发文量

期刊最新文献

Fast Computational GPU Design with GT-Pin On Power-Performance Characterization of Concurrent Throughput Kernels CRONO: A Benchmark Suite for Multithreaded Graph Algorithms Executing on Futuristic Multicores Exploring Parallel Programming Models for Heterogeneous Computing Systems Revealing Critical Loads and Hidden Data Locality in GPGPU Applications