{"title":"利用最大评分子序列和gpu对两两全基因组序列比对进行高分片段选择","authors":"Abdulrhman Aljouie, Ling Zhong, Usman Roshan","doi":"10.1504/ijcbdd.2020.10026787","DOIUrl":null,"url":null,"abstract":"Whole genome alignment programs use string matching with hash tables to identify high scoring fragments between a query and target sequence around which a full alignment is then built. A recent study comparing alignment programs showed that while evolutionary similar genomes were easy to align, divergent genomes still posed a challenge to existing methods. To fill this gap we explore the use of the maximum scoring subsequence to identify high scoring fragments. We split the query genome into several fragments and align them to the target with a previously published parallel algorithm for short read alignment. We then pass such high scoring fragments on to the LASTZ program to obtain a more complete alignment. On simulated data we obtain an average of at least 20% higher accuracy than the alignment given by LASTZ at the expense of few hours of additional runtime. Our source code is freely available at http://web.njit.edu/usman/MSGA","PeriodicalId":13612,"journal":{"name":"Int. J. Comput. Biol. Drug Des.","volume":"24 1","pages":"71-81"},"PeriodicalIF":0.0000,"publicationDate":"2020-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"High scoring segment selection for pairwise whole genome sequence alignment with the maximum scoring subsequence and GPUs\",\"authors\":\"Abdulrhman Aljouie, Ling Zhong, Usman Roshan\",\"doi\":\"10.1504/ijcbdd.2020.10026787\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Whole genome alignment programs use string matching with hash tables to identify high scoring fragments between a query and target sequence around which a full alignment is then built. A recent study comparing alignment programs showed that while evolutionary similar genomes were easy to align, divergent genomes still posed a challenge to existing methods. To fill this gap we explore the use of the maximum scoring subsequence to identify high scoring fragments. We split the query genome into several fragments and align them to the target with a previously published parallel algorithm for short read alignment. We then pass such high scoring fragments on to the LASTZ program to obtain a more complete alignment. On simulated data we obtain an average of at least 20% higher accuracy than the alignment given by LASTZ at the expense of few hours of additional runtime. Our source code is freely available at http://web.njit.edu/usman/MSGA\",\"PeriodicalId\":13612,\"journal\":{\"name\":\"Int. J. Comput. Biol. Drug Des.\",\"volume\":\"24 1\",\"pages\":\"71-81\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-02-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Comput. Biol. Drug Des.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1504/ijcbdd.2020.10026787\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Comput. Biol. Drug Des.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/ijcbdd.2020.10026787","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
High scoring segment selection for pairwise whole genome sequence alignment with the maximum scoring subsequence and GPUs
Whole genome alignment programs use string matching with hash tables to identify high scoring fragments between a query and target sequence around which a full alignment is then built. A recent study comparing alignment programs showed that while evolutionary similar genomes were easy to align, divergent genomes still posed a challenge to existing methods. To fill this gap we explore the use of the maximum scoring subsequence to identify high scoring fragments. We split the query genome into several fragments and align them to the target with a previously published parallel algorithm for short read alignment. We then pass such high scoring fragments on to the LASTZ program to obtain a more complete alignment. On simulated data we obtain an average of at least 20% higher accuracy than the alignment given by LASTZ at the expense of few hours of additional runtime. Our source code is freely available at http://web.njit.edu/usman/MSGA