{"title":"多锚分段对齐算法-敏感(MASAA - S)","authors":"Bharath Reddy, Richard Fields","doi":"10.1109/ICICT50521.2020.00064","DOIUrl":null,"url":null,"abstract":"Sequence alignment is common nowadays as it is used in computational biology or Bioinformatics to determine how closely two sequences are similar. There are many computational algorithms developed over the course of time to not only align two sequences. The first algorithms developed were based on a technique called Dynamic Programming which rendered them slow but produce optimal alignment. Today, however heuristic approach algorithms are popular as they are faster and yet produce near optimal alignment. In this paper, we are going to improve on a heuristic algorithm called MASAA (Multiple Anchor Staged Local Sequence Alignment Algorithm) - which we published previously. This new algorithm appropriately called MASAA - S stands for MASAA Sensitive. The algorithm is based on suffix tree data structure to identify anchors first, but to improve sensitivity, we employ adaptive seeds, and shorter perfect match seeds in between the already identified anchors. When the Anchors are separated by a greater distance than a threshold 'd', we exclude such anchors. We tested this algorithm on a randomly generated sequences, and Rosetta dataset where the sequence length ranged up to 500 thousand.","PeriodicalId":445000,"journal":{"name":"2020 3rd International Conference on Information and Computer Technologies (ICICT)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Multiple Anchor Staged Alignment Algorithm – Sensitive (MASAA – S)\",\"authors\":\"Bharath Reddy, Richard Fields\",\"doi\":\"10.1109/ICICT50521.2020.00064\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sequence alignment is common nowadays as it is used in computational biology or Bioinformatics to determine how closely two sequences are similar. There are many computational algorithms developed over the course of time to not only align two sequences. The first algorithms developed were based on a technique called Dynamic Programming which rendered them slow but produce optimal alignment. Today, however heuristic approach algorithms are popular as they are faster and yet produce near optimal alignment. In this paper, we are going to improve on a heuristic algorithm called MASAA (Multiple Anchor Staged Local Sequence Alignment Algorithm) - which we published previously. This new algorithm appropriately called MASAA - S stands for MASAA Sensitive. The algorithm is based on suffix tree data structure to identify anchors first, but to improve sensitivity, we employ adaptive seeds, and shorter perfect match seeds in between the already identified anchors. When the Anchors are separated by a greater distance than a threshold 'd', we exclude such anchors. We tested this algorithm on a randomly generated sequences, and Rosetta dataset where the sequence length ranged up to 500 thousand.\",\"PeriodicalId\":445000,\"journal\":{\"name\":\"2020 3rd International Conference on Information and Computer Technologies (ICICT)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 3rd International Conference on Information and Computer Technologies (ICICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICICT50521.2020.00064\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 3rd International Conference on Information and Computer Technologies (ICICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICT50521.2020.00064","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Sequence alignment is common nowadays as it is used in computational biology or Bioinformatics to determine how closely two sequences are similar. There are many computational algorithms developed over the course of time to not only align two sequences. The first algorithms developed were based on a technique called Dynamic Programming which rendered them slow but produce optimal alignment. Today, however heuristic approach algorithms are popular as they are faster and yet produce near optimal alignment. In this paper, we are going to improve on a heuristic algorithm called MASAA (Multiple Anchor Staged Local Sequence Alignment Algorithm) - which we published previously. This new algorithm appropriately called MASAA - S stands for MASAA Sensitive. The algorithm is based on suffix tree data structure to identify anchors first, but to improve sensitivity, we employ adaptive seeds, and shorter perfect match seeds in between the already identified anchors. When the Anchors are separated by a greater distance than a threshold 'd', we exclude such anchors. We tested this algorithm on a randomly generated sequences, and Rosetta dataset where the sequence length ranged up to 500 thousand.