Mingming Sun, Xuehai Zhou, Feng Yang, Kun Lu, Dong Dai
{"title":"Bwasw-Cloud: Efficient sequence alignment algorithm for two big data with MapReduce","authors":"Mingming Sun, Xuehai Zhou, Feng Yang, Kun Lu, Dong Dai","doi":"10.1109/ICADIWT.2014.6814662","DOIUrl":null,"url":null,"abstract":"The recent next-generation sequencing machines generate sequences at an unprecedented rate, and a sequence is not short any more called read. The reference sequences which are aligned reads against are also increasingly large. Efficiently mapping large number of long sequences with big reference sequences poses a new challenge to sequence alignment. Sequence alignment algorithms become to match on two big data. To address the above problem, we propose a new parallel sequence alignment algorithm called Bwasw-Cloud, optimized for aligning long reads against a large sequence data (e.g. the human genome). It is modeled after the widely used BWA-SW algorithm and uses the open-source Hadoop implementation of MapReduce. The results show that Bwasw-Cloud can effectively and quickly match two big data in common cluster.","PeriodicalId":339627,"journal":{"name":"The Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2014)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2014)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICADIWT.2014.6814662","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
The recent next-generation sequencing machines generate sequences at an unprecedented rate, and a sequence is not short any more called read. The reference sequences which are aligned reads against are also increasingly large. Efficiently mapping large number of long sequences with big reference sequences poses a new challenge to sequence alignment. Sequence alignment algorithms become to match on two big data. To address the above problem, we propose a new parallel sequence alignment algorithm called Bwasw-Cloud, optimized for aligning long reads against a large sequence data (e.g. the human genome). It is modeled after the widely used BWA-SW algorithm and uses the open-source Hadoop implementation of MapReduce. The results show that Bwasw-Cloud can effectively and quickly match two big data in common cluster.