{"title":"Optimal haplotype assembly with statistical pruning","authors":"Shreepriya Das, H. Vikalo","doi":"10.1109/GlobalSIP.2014.7032339","DOIUrl":null,"url":null,"abstract":"Solving the haplotype assembly problem by optimizing the commonly used minimum error correction criterion is known to be NP-hard. For this reason, suboptimal heuristics are often used in practice. In this paper, we propose a novel method for optimal haplotype assembly that is based on depth-first branch-and-bound search of the solution space. Our scheme is inspired by the sphere decodng algorithms used heavily in the field of digital communications. Using the statistical information about errors in sequencing data, we constrain the search of the haplotype space and speedily find the optimal solution to the haplotype assembly problem. Theoretical analysis of the expected complexity of the algorithm shows that optimal haplotype assembly is practically feasible for haplotype blocks of moderate lengths typically obtained using present day high throughput sequencers. The scheme is then tested on 1000 Genomes Project experimental data to verify the efficacy of the proposed method.","PeriodicalId":362306,"journal":{"name":"2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP)","volume":"30 10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GlobalSIP.2014.7032339","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Solving the haplotype assembly problem by optimizing the commonly used minimum error correction criterion is known to be NP-hard. For this reason, suboptimal heuristics are often used in practice. In this paper, we propose a novel method for optimal haplotype assembly that is based on depth-first branch-and-bound search of the solution space. Our scheme is inspired by the sphere decodng algorithms used heavily in the field of digital communications. Using the statistical information about errors in sequencing data, we constrain the search of the haplotype space and speedily find the optimal solution to the haplotype assembly problem. Theoretical analysis of the expected complexity of the algorithm shows that optimal haplotype assembly is practically feasible for haplotype blocks of moderate lengths typically obtained using present day high throughput sequencers. The scheme is then tested on 1000 Genomes Project experimental data to verify the efficacy of the proposed method.