{"title":"GPU acceleration of finding frequent patterns over large biological sequence","authors":"Shufang Du, Longjiang Guo, Chunyu Ai, Jinbao Li, Meirui Ren, Yahong Guo","doi":"10.1109/PADSW.2014.7097865","DOIUrl":null,"url":null,"abstract":"Biological frequent patterns usually correspond to the important function (or structure) in biological sequences. Along with the rapid growth of biological sequences, it is significant to find frequent patterns over a large bio-sequence efficiently. However, most of existing algorithms need to produce lots of short patterns or projected databases, which influence the efficiency badly and also increase the cost of space. Graphics processing units (GPUs) embracing many core computing devices, have been extensively applied to accelerate computation performance in many areas. In order to meet the demand of biologists, we redefine the frequent pattern problem with length constraints for finding frequent patterns. We present pruning optimization method for the serial algorithm (POSA), and based on this technique, we propose a parallel algorithm (POPA) which not only reduces the time complexity with a low space cost but also obtains better performance on CUDA. To validate the presented algorithms, we implemented the algorithms on multiple-core CPU and various GPU devices. Also, CUDA optimization techniques are applied to speed up calculation in the paper. Finally, experimental results show that compared with the serial algorithm on CPU with six cores, POSA achieves 1.2~4.5 speedup, and POPA gains 3~20 speedup.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PADSW.2014.7097865","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Biological frequent patterns usually correspond to the important function (or structure) in biological sequences. Along with the rapid growth of biological sequences, it is significant to find frequent patterns over a large bio-sequence efficiently. However, most of existing algorithms need to produce lots of short patterns or projected databases, which influence the efficiency badly and also increase the cost of space. Graphics processing units (GPUs) embracing many core computing devices, have been extensively applied to accelerate computation performance in many areas. In order to meet the demand of biologists, we redefine the frequent pattern problem with length constraints for finding frequent patterns. We present pruning optimization method for the serial algorithm (POSA), and based on this technique, we propose a parallel algorithm (POPA) which not only reduces the time complexity with a low space cost but also obtains better performance on CUDA. To validate the presented algorithms, we implemented the algorithms on multiple-core CPU and various GPU devices. Also, CUDA optimization techniques are applied to speed up calculation in the paper. Finally, experimental results show that compared with the serial algorithm on CPU with six cores, POSA achieves 1.2~4.5 speedup, and POPA gains 3~20 speedup.
生物频率模式通常与生物序列中的重要功能(或结构)相对应。随着生物序列的快速增长,在一个大的生物序列中有效地发现频繁模式是一个非常重要的问题。然而,现有的算法大多需要生成大量的短模式或投影数据库,这严重影响了算法的效率,也增加了空间成本。图形处理单元(Graphics processing unit, gpu)包含了许多核心计算设备,在许多领域被广泛应用于加速计算性能。为了满足生物学家的需求,我们重新定义了具有长度约束的频繁模式问题,以寻找频繁模式。提出了串行算法(POSA)的剪枝优化方法,并在此基础上提出了一种并行算法(POPA),该算法不仅以较低的空间成本降低了时间复杂度,而且在CUDA上获得了更好的性能。为了验证所提出的算法,我们在多核CPU和各种GPU设备上实现了算法。此外,本文还采用了CUDA优化技术来加快计算速度。实验结果表明,与六核CPU上的串行算法相比,POSA算法的速度提高了1.2~4.5,POPA算法的速度提高了3~20。