Accelerating iterative algorithms with asynchronous accumulative updates on FPGAs

2013 International Conference on Field-Programmable Technology (FPT) Pub Date : 2013-12-01 DOI:10.1109/FPT.2013.6718332

D. Unnikrishnan, S. Virupaksha, Lekshmi Krishnan, Lixin Gao, R. Tessier

{"title":"Accelerating iterative algorithms with asynchronous accumulative updates on FPGAs","authors":"D. Unnikrishnan, S. Virupaksha, Lekshmi Krishnan, Lixin Gao, R. Tessier","doi":"10.1109/FPT.2013.6718332","DOIUrl":null,"url":null,"abstract":"Iterative algorithms represent a pervasive class of data mining, web search and scientific computing applications. In iterative algorithms, a final result is derived by performing repetitive computations on an input data set. Existing techniques to parallelize such algorithms typically use software frameworks such as MapReduce and Hadoop to distribute data for an iteration across multiple CPU-based workstations in a cluster and collect per-iteration results. These platforms are marked by the need to synchronize data computations at iteration boundaries, impeding system performance. In this paper, we demonstrate that FPGAs in distributed computing systems can serve a vital role in breaking this synchronization barrier with the help of asynchronous accumulative updates. These updates allow for the accumulation of intermediate results for numerous data points without the need for iteration-based barriers allowing individual nodes in a cluster to independently make progress towards the final outcome. Computation is dynamically prioritized to accelerate algorithm convergence. A general-class of iterative algorithms have been implemented on a cluster of four FPGAs. A speedup of 7× is achieved over an implementation of asynchronous accumulative updates on a general-purpose CPU. The system offers up to 154× speedup versus a standard Hadoop-based CPU-workstation. Improved performance is achieved by clusters of FPGAs.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Field-Programmable Technology (FPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPT.2013.6718332","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Iterative algorithms represent a pervasive class of data mining, web search and scientific computing applications. In iterative algorithms, a final result is derived by performing repetitive computations on an input data set. Existing techniques to parallelize such algorithms typically use software frameworks such as MapReduce and Hadoop to distribute data for an iteration across multiple CPU-based workstations in a cluster and collect per-iteration results. These platforms are marked by the need to synchronize data computations at iteration boundaries, impeding system performance. In this paper, we demonstrate that FPGAs in distributed computing systems can serve a vital role in breaking this synchronization barrier with the help of asynchronous accumulative updates. These updates allow for the accumulation of intermediate results for numerous data points without the need for iteration-based barriers allowing individual nodes in a cluster to independently make progress towards the final outcome. Computation is dynamically prioritized to accelerate algorithm convergence. A general-class of iterative algorithms have been implemented on a cluster of four FPGAs. A speedup of 7× is achieved over an implementation of asynchronous accumulative updates on a general-purpose CPU. The system offers up to 154× speedup versus a standard Hadoop-based CPU-workstation. Improved performance is achieved by clusters of FPGAs.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

fpga上异步累积更新的加速迭代算法

迭代算法在数据挖掘、网络搜索和科学计算应用中无处不在。在迭代算法中，通过对输入数据集进行重复计算得到最终结果。现有的算法并行化技术通常使用MapReduce和Hadoop等软件框架，在集群中跨多个基于cpu的工作站分发迭代数据，并收集每次迭代的结果。这些平台的特点是需要在迭代边界同步数据计算，从而阻碍了系统性能。在本文中，我们证明了分布式计算系统中的fpga可以在异步累积更新的帮助下打破这种同步障碍方面发挥重要作用。这些更新允许大量数据点的中间结果的积累，而不需要基于迭代的障碍，允许集群中的单个节点独立地向最终结果前进。动态优化计算优先级，加快算法收敛速度。在一个由四个fpga组成的集群上实现了一类通用的迭代算法。通过在通用CPU上实现异步累积更新，可以实现7倍的加速。与标准的基于hadoop的cpu工作站相比，该系统提供了高达154倍的加速。通过fpga集群实现性能的提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2013 International Conference on Field-Programmable Technology (FPT)

自引率

0.00%

发文量