Exploiting Flow Graph of System of ODEs to Accelerate the Simulation of Biologically-Detailed Neural Networks

2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2019-05-20 DOI:10.1109/IPDPS.2019.00028

Bruno R. C. Magalhães, T. Sterling, F. Schürmann, M. Hines

{"title":"Exploiting Flow Graph of System of ODEs to Accelerate the Simulation of Biologically-Detailed Neural Networks","authors":"Bruno R. C. Magalhães, T. Sterling, F. Schürmann, M. Hines","doi":"10.1109/IPDPS.2019.00028","DOIUrl":null,"url":null,"abstract":"Exposing parallelism in scientific applications has become a core requirement for efficiently running on modern distributed multicore SIMD compute architectures. The granularity of parallelism that can be attained is a key determinant for the achievable acceleration and time to solution. Motivated by a scientific use case that requires the simulation of long spans of time — the study of plasticity and learning in detailed models of brain tissue — we present a strategy that exposes and exploits multicore and SIMD micro-parallelism from unrolling flow dependencies and concurrent outputs in a large system of coupled ordinary differential equations (ODEs). An implementation of a parallel simulator is presented, running on the HPX runtime system for the ParalleX execution model, providing dynamic task-scheduling and asynchronous execution. The implementation was tested on different architectures using a previously published brain tissue model. Benchmark of single neurons on a single compute node present a speed-up of circa 4-7x when compared with the state of the art Single Instruction Multiple Data (SIMD) implementation and 13-40x over its Single Instruction Single Data (SISD) counterpart. Large scale benchmarks suggest almost ideal strong scaling and a speed-up of 2-8x on a distributed architecture of 128 Cray X6 compute nodes.","PeriodicalId":403406,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2019.00028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Exposing parallelism in scientific applications has become a core requirement for efficiently running on modern distributed multicore SIMD compute architectures. The granularity of parallelism that can be attained is a key determinant for the achievable acceleration and time to solution. Motivated by a scientific use case that requires the simulation of long spans of time — the study of plasticity and learning in detailed models of brain tissue — we present a strategy that exposes and exploits multicore and SIMD micro-parallelism from unrolling flow dependencies and concurrent outputs in a large system of coupled ordinary differential equations (ODEs). An implementation of a parallel simulator is presented, running on the HPX runtime system for the ParalleX execution model, providing dynamic task-scheduling and asynchronous execution. The implementation was tested on different architectures using a previously published brain tissue model. Benchmark of single neurons on a single compute node present a speed-up of circa 4-7x when compared with the state of the art Single Instruction Multiple Data (SIMD) implementation and 13-40x over its Single Instruction Single Data (SISD) counterpart. Large scale benchmarks suggest almost ideal strong scaling and a speed-up of 2-8x on a distributed architecture of 128 Cray X6 compute nodes.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用ode系统的流程图加速生物精细神经网络的仿真

在科学应用程序中公开并行性已经成为在现代分布式多核SIMD计算体系结构上高效运行的核心需求。可获得的并行度粒度是可实现的加速和求解时间的关键决定因素。在一个需要长时间模拟的科学用例的激励下——在脑组织的详细模型中研究可塑性和学习——我们提出了一种策略，该策略暴露并利用了多核和SIMD微并行性，这些微并行性来自于耦合常微分方程(ode)的大型系统中的展开流依赖和并发输出。提出了一种并行模拟器的实现，该模拟器运行在HPX运行时系统上，为parallelx执行模型提供动态任务调度和异步执行。使用先前发表的脑组织模型在不同的架构上对该实现进行了测试。与单指令多数据(SIMD)实现相比，在单个计算节点上对单个神经元进行基准测试的速度提高了大约4-7倍，比单指令单数据(SISD)实现的速度提高了13-40倍。大规模基准测试表明，在128个Cray X6计算节点的分布式架构上，几乎可以实现理想的强大扩展和2-8倍的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量