Big data genome sequencing on Zynq based clusters (abstract only)

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays Pub Date : 2014-02-26 DOI:10.1145/2554688.2554694

Chao Wang, Xi Li, Xuehai Zhou, Yunji Chen, R. Cheung

{"title":"Big data genome sequencing on Zynq based clusters (abstract only)","authors":"Chao Wang, Xi Li, Xuehai Zhou, Yunji Chen, R. Cheung","doi":"10.1145/2554688.2554694","DOIUrl":null,"url":null,"abstract":"Next-generation sequencing (NGS) problems have attracted many attentions of researchers in biological and medical computing domains. The current state-of-the-art NGS computing machines are dramatically lowering the cost and increasing the throughput of DNA sequencing. In this paper, we propose a practical study that uses Xilinx Zynq board to summarize acceleration engines using FPGA accelerators and ARM processors for the state-of-the-art short read mapping approaches. The heterogeneous processors and accelerators are coupled with each other using a general Hadoop distributed processing framework. First the reads are collected by the central server, and then distributed to multiple accelerators on the Zynq for hardware acceleration. Therefore, the combination of hardware acceleration and Map-Reduce execution flow could greatly accelerate the task of aligning short length reads to a known reference genome. Our approach is based on preprocessing the reference genomes and iterative jobs for aligning the continuous incoming reads. The hardware acceleration is based on the creditable read-mapping algorithm RMAP software approach. Furthermore, the speedup analysis on a Hadoop cluster, which concludes 8 development boards, is evaluated. Experimental results demonstrate that our proposed architecture and methods has the speedup of more than 112X, and is scalable with the number of accelerators. Finally, the Zynq based cluster has efficient potential to accelerate even general large scale big data applications. This work was supported by the NSFC grants No. 61379040, No. 61272131 and No. 61202053.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2554688.2554694","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Next-generation sequencing (NGS) problems have attracted many attentions of researchers in biological and medical computing domains. The current state-of-the-art NGS computing machines are dramatically lowering the cost and increasing the throughput of DNA sequencing. In this paper, we propose a practical study that uses Xilinx Zynq board to summarize acceleration engines using FPGA accelerators and ARM processors for the state-of-the-art short read mapping approaches. The heterogeneous processors and accelerators are coupled with each other using a general Hadoop distributed processing framework. First the reads are collected by the central server, and then distributed to multiple accelerators on the Zynq for hardware acceleration. Therefore, the combination of hardware acceleration and Map-Reduce execution flow could greatly accelerate the task of aligning short length reads to a known reference genome. Our approach is based on preprocessing the reference genomes and iterative jobs for aligning the continuous incoming reads. The hardware acceleration is based on the creditable read-mapping algorithm RMAP software approach. Furthermore, the speedup analysis on a Hadoop cluster, which concludes 8 development boards, is evaluated. Experimental results demonstrate that our proposed architecture and methods has the speedup of more than 112X, and is scalable with the number of accelerators. Finally, the Zynq based cluster has efficient potential to accelerate even general large scale big data applications. This work was supported by the NSFC grants No. 61379040, No. 61272131 and No. 61202053.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于Zynq集群的大数据基因组测序(仅摘要)

下一代测序(NGS)问题引起了生物和医学计算领域研究人员的广泛关注。目前最先进的NGS计算机器大大降低了成本，提高了DNA测序的吞吐量。在本文中，我们提出了一项实际研究，使用Xilinx Zynq板来总结使用FPGA加速器和ARM处理器的最先进的短读映射方法的加速引擎。异构处理器和加速器使用通用的Hadoop分布式处理框架相互耦合。首先，读取数据由中央服务器收集，然后分发到Zynq上的多个加速器进行硬件加速。因此，硬件加速和Map-Reduce执行流程的结合可以大大加快短长度读取到已知参考基因组的比对任务。我们的方法是基于预处理参考基因组和迭代工作，以对准连续的传入读取。硬件加速基于可信读映射算法RMAP软件方法。此外，还对包含8个开发板的Hadoop集群进行了加速分析。实验结果表明，我们提出的架构和方法具有超过112X的加速，并且可以随加速器数量的增加而扩展。最后，基于Zynq的集群具有加速一般大规模大数据应用的有效潜力。国家自然科学基金项目(61379040、61272131和61202053)资助。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays

自引率

0.00%

发文量