摘要:混合核计算机的混合广度优先搜索实现

2012 SC Companion: High Performance Computing, Networking Storage and Analysis Pub Date : 2012-11-10 DOI:10.1109/SC.Companion.2012.184

Kevin R. Wadleigh, John Amelio, K. Collins, G. Edwards

{"title":"摘要:混合核计算机的混合广度优先搜索实现","authors":"Kevin R. Wadleigh, John Amelio, K. Collins, G. Edwards","doi":"10.1109/SC.Companion.2012.184","DOIUrl":null,"url":null,"abstract":"Summary form only given. The Graph500 benchmark is designed to evaluate the suitability of supercomputing systems for graph algorithms, which are increasingly important in HPC. The timed Graph500 kernel, Breadth First Search, exhibits memory access patterns typical of these types of applications, with poor spatial locality and synchronization between multiple streams of execution. The Graph500 benchmark was ported to the Convey HC-2ex and MX-100, hybrid-core computers with an Intel host system and a coprocessor incorporating four reprogrammable Xilinx FPGAs. The computers contain a unique memory system designed to sustain high bandwidth for random memory accesses. The BFS kernel was implemented as a hybrid algorithm with concurrent processing on both the host and coprocessor. The early steps use a top-down algorithm on the host with results copied to coprocessor memory for use in a bottom-up algorithm. The coprocessor uses thousands of threads to traverse the graph. The resulting implementation runs at over 16 billion TEPS.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"5 1","pages":"1354-1354"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Abstract: Hybrid Breadth First Search Implementation for Hybrid-Core Computers\",\"authors\":\"Kevin R. Wadleigh, John Amelio, K. Collins, G. Edwards\",\"doi\":\"10.1109/SC.Companion.2012.184\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Summary form only given. The Graph500 benchmark is designed to evaluate the suitability of supercomputing systems for graph algorithms, which are increasingly important in HPC. The timed Graph500 kernel, Breadth First Search, exhibits memory access patterns typical of these types of applications, with poor spatial locality and synchronization between multiple streams of execution. The Graph500 benchmark was ported to the Convey HC-2ex and MX-100, hybrid-core computers with an Intel host system and a coprocessor incorporating four reprogrammable Xilinx FPGAs. The computers contain a unique memory system designed to sustain high bandwidth for random memory accesses. The BFS kernel was implemented as a hybrid algorithm with concurrent processing on both the host and coprocessor. The early steps use a top-down algorithm on the host with results copied to coprocessor memory for use in a bottom-up algorithm. The coprocessor uses thousands of threads to traverse the graph. The resulting implementation runs at over 16 billion TEPS.\",\"PeriodicalId\":6346,\"journal\":{\"name\":\"2012 SC Companion: High Performance Computing, Networking Storage and Analysis\",\"volume\":\"5 1\",\"pages\":\"1354-1354\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-11-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 SC Companion: High Performance Computing, Networking Storage and Analysis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SC.Companion.2012.184\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SC.Companion.2012.184","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

只提供摘要形式。Graph500基准测试旨在评估超级计算系统对图算法的适用性，图算法在高性能计算中越来越重要。限时Graph500内核，广度优先搜索，展示了这些类型应用程序的典型内存访问模式，具有较差的空间局部性和多个执行流之间的同步。Graph500基准测试被移植到带有Intel主机系统和包含四个可重新编程Xilinx fpga的协处理器的混合核计算机上。这些计算机包含一个独特的存储系统，设计用于维持随机存储器访问的高带宽。BFS内核是采用主机和协处理器并行处理的混合算法实现的。早期的步骤在主机上使用自顶向下算法，并将结果复制到协处理器内存中，以便在自底向上算法中使用。协处理器使用数千个线程来遍历图。最终实现的运行速度超过160亿TEPS。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Abstract: Hybrid Breadth First Search Implementation for Hybrid-Core Computers

Summary form only given. The Graph500 benchmark is designed to evaluate the suitability of supercomputing systems for graph algorithms, which are increasingly important in HPC. The timed Graph500 kernel, Breadth First Search, exhibits memory access patterns typical of these types of applications, with poor spatial locality and synchronization between multiple streams of execution. The Graph500 benchmark was ported to the Convey HC-2ex and MX-100, hybrid-core computers with an Intel host system and a coprocessor incorporating four reprogrammable Xilinx FPGAs. The computers contain a unique memory system designed to sustain high bandwidth for random memory accesses. The BFS kernel was implemented as a hybrid algorithm with concurrent processing on both the host and coprocessor. The early steps use a top-down algorithm on the host with results copied to coprocessor memory for use in a bottom-up algorithm. The coprocessor uses thousands of threads to traverse the graph. The resulting implementation runs at over 16 billion TEPS.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 SC Companion: High Performance Computing, Networking Storage and Analysis

自引率

0.00%

发文量