{"title":"RLALIGN: A Reinforcement Learning Approach for Multiple Sequence Alignment","authors":"R. Ramakrishnan, Jaspal Singh, M. Blanchette","doi":"10.1109/BIBE.2018.00019","DOIUrl":null,"url":null,"abstract":"Multiple sequence alignment (MSA) is one of the best studied problems in bioinformatics because of the broad set of genomics, proteomics, and evolutionary analyses that rely on it. Yet the problem is NP-hard and existing heuristics are imperfect. Reinforcement learning (RL) techniques have emerged recently as a potential solution to a wide diversity of computational problems, but have yet to be applied to MSA. In this paper, we describe RLALIGN, a method to solve the MSA problem using RL. RLALIGN is based on Asynchronous Advantage Actor Critic (A3C), a cutting-edge RL framework. Due to the absence of a goal state, however, it required several important modifications. RLALIGN can be trained to accurately align moderate-length sequences, and various heuristics allow it to scale to longer sequences. The accuracy of the alignments produced is on par with, and often better than those of well established alignment algorithms. Overall, our work demonstrates the potential of RL approaches for complex combinatorial problems such as MSA. RLALIGN will prove useful for realignment tasks, where portions of a larger alignment need to be optimized. Unlike classical algorithms, RLALIGN is incognizant to the nature of the scoring scheme, leading to easy generalization to a variety of problem variants.","PeriodicalId":127507,"journal":{"name":"2018 IEEE 18th International Conference on Bioinformatics and Bioengineering (BIBE)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 18th International Conference on Bioinformatics and Bioengineering (BIBE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE.2018.00019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Multiple sequence alignment (MSA) is one of the best studied problems in bioinformatics because of the broad set of genomics, proteomics, and evolutionary analyses that rely on it. Yet the problem is NP-hard and existing heuristics are imperfect. Reinforcement learning (RL) techniques have emerged recently as a potential solution to a wide diversity of computational problems, but have yet to be applied to MSA. In this paper, we describe RLALIGN, a method to solve the MSA problem using RL. RLALIGN is based on Asynchronous Advantage Actor Critic (A3C), a cutting-edge RL framework. Due to the absence of a goal state, however, it required several important modifications. RLALIGN can be trained to accurately align moderate-length sequences, and various heuristics allow it to scale to longer sequences. The accuracy of the alignments produced is on par with, and often better than those of well established alignment algorithms. Overall, our work demonstrates the potential of RL approaches for complex combinatorial problems such as MSA. RLALIGN will prove useful for realignment tasks, where portions of a larger alignment need to be optimized. Unlike classical algorithms, RLALIGN is incognizant to the nature of the scoring scheme, leading to easy generalization to a variety of problem variants.