Evaluation of GPU Architectures Using Spiking Neural Networks

2011 Symposium on Application Accelerators in High-Performance Computing Pub Date : 2011-07-19 DOI:10.1109/SAAHPC.2011.20

V. Pallipuram, M. Bhuiyan, M. C. Smith

{"title":"Evaluation of GPU Architectures Using Spiking Neural Networks","authors":"V. Pallipuram, M. Bhuiyan, M. C. Smith","doi":"10.1109/SAAHPC.2011.20","DOIUrl":null,"url":null,"abstract":"During recent years General-Purpose Graphical Processing Units (GP-GPUs) have entered the field of High-Performance Computing (HPC) as one of the primary architectural focuses for many research groups working with complex scientific applications. Nvidia's Tesla C2050, codenamed Fermi, and AMD's Radeon 5870 are two devices positioned to meet the computationally demanding needs of supercomputing research groups across the globe. Though Nvidia GPUs powered by CUDA have been the frequent choices of the performance centric research groups, the introduction and growth of OpenCL has promoted AMD GP-GPUs as potential accelerator candidates that can challenge Nvidia's stronghold. These architectures not only offer a plethora of features for application developers to explore, but their radically different architectures calls for a detailed study that weighs their merits and evaluates their potential to accelerate complex scientific applications. In this paper, we present our performance analysis research comparing Nvidia's Fermi and AMD's Radeon 5870 using OpenCL as the common programming model. We have chosen four different neuron models for Spiking Neural Networks (SNNs), each with different communication and computation requirements, namely the Izhikevich, Wilson, Morris Lecar (ML), and the Hodgkin Huxley (HH) models. We compare the runtime performance of the Fermi and Radeon GPUs with an implementation that exhausts all optimization techniques available with OpenCL. Several equivalent architectural parameters of the two GPUs are studied and correlated with the application performance. In addition to the comparative study effort, our implementations were able to achieve a speed-up of 857.3x and 658.51x on the Fermi and Radeon architectures respectively for the most compute intensive HH model with a dense network containing 9.72 million neurons. The final outcome of this research is a detailed architectural comparison of the two GPU architectures with a common programming platform.","PeriodicalId":331604,"journal":{"name":"2011 Symposium on Application Accelerators in High-Performance Computing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2011-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 Symposium on Application Accelerators in High-Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SAAHPC.2011.20","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

During recent years General-Purpose Graphical Processing Units (GP-GPUs) have entered the field of High-Performance Computing (HPC) as one of the primary architectural focuses for many research groups working with complex scientific applications. Nvidia's Tesla C2050, codenamed Fermi, and AMD's Radeon 5870 are two devices positioned to meet the computationally demanding needs of supercomputing research groups across the globe. Though Nvidia GPUs powered by CUDA have been the frequent choices of the performance centric research groups, the introduction and growth of OpenCL has promoted AMD GP-GPUs as potential accelerator candidates that can challenge Nvidia's stronghold. These architectures not only offer a plethora of features for application developers to explore, but their radically different architectures calls for a detailed study that weighs their merits and evaluates their potential to accelerate complex scientific applications. In this paper, we present our performance analysis research comparing Nvidia's Fermi and AMD's Radeon 5870 using OpenCL as the common programming model. We have chosen four different neuron models for Spiking Neural Networks (SNNs), each with different communication and computation requirements, namely the Izhikevich, Wilson, Morris Lecar (ML), and the Hodgkin Huxley (HH) models. We compare the runtime performance of the Fermi and Radeon GPUs with an implementation that exhausts all optimization techniques available with OpenCL. Several equivalent architectural parameters of the two GPUs are studied and correlated with the application performance. In addition to the comparative study effort, our implementations were able to achieve a speed-up of 857.3x and 658.51x on the Fermi and Radeon architectures respectively for the most compute intensive HH model with a dense network containing 9.72 million neurons. The final outcome of this research is a detailed architectural comparison of the two GPU architectures with a common programming platform.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用峰值神经网络评估GPU架构

近年来，通用图形处理单元(gp - gpu)已经进入高性能计算(HPC)领域，成为许多研究小组处理复杂科学应用的主要架构焦点之一。英伟达的Tesla C2050(代号为Fermi)和AMD的Radeon 5870是两款旨在满足全球超级计算研究团队计算需求的设备。尽管基于CUDA的英伟达gpu一直是性能中心研究小组的频繁选择，但OpenCL的引入和发展使AMD gp - gpu成为可能挑战英伟达据点的潜在加速器候选人。这些体系结构不仅为应用程序开发人员提供了大量的特性来探索，而且它们截然不同的体系结构需要对它们的优点进行详细的研究，并评估它们加速复杂科学应用程序的潜力。本文采用OpenCL作为通用编程模型，对Nvidia的Fermi和AMD的Radeon 5870进行了性能分析研究。我们为脉冲神经网络(snn)选择了四种不同的神经元模型，每种模型都有不同的通信和计算要求，即Izhikevich, Wilson, Morris Lecar (ML)和Hodgkin Huxley (HH)模型。我们将Fermi和Radeon gpu的运行时性能与耗尽OpenCL可用的所有优化技术的实现进行比较。研究了两种gpu的几个等效架构参数，并将其与应用性能进行了关联。除了比较研究之外，我们的实现能够在Fermi和Radeon架构上分别实现857.3倍和658.51倍的加速，用于包含972万个神经元的密集网络的最计算密集型HH模型。本研究的最终结果是在一个通用的编程平台上对两种GPU架构进行了详细的架构比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2011 Symposium on Application Accelerators in High-Performance Computing

自引率

0.00%

发文量