{"title":"qPMS10:一种高效解决quorum种植Motif搜索问题的随机化算法","authors":"Peng Xiao, Soumitra Pal, S. Rajasekaran","doi":"10.1109/BIBM.2016.7822598","DOIUrl":null,"url":null,"abstract":"Discovering patterns in biological sequences is very important to extract useful information from them. Motifs are crucial patterns that have numerous applications including the identification of transcription factors and their binding sites, composite regulatory patterns, similiarity between families of proteins, etc. Several models of motifs have been proposed in the literature. The (l, d)-motif model is one of these that has been studied widely. The (l, d)-motif search problem is also known as Planted Motif Search (PMS). The general problem of PMS has been proven to be NP-hard. In this paper, we present an elegant as well as efficient randomized algorithm, named qPMS10, to solve PMS. Currently, the best known algorithm for solving PMS is qPMS9 and it can solve challenging (l, d)-motif instances up to (28, 12) and (30, 13). qPMS9 is a deterministic algorithm. We provide a performance comparison of qPMS10 with qPMS9 on standard benchmark datasets. Both theoretical and empirical analysis demonstrate that our randomized algorithm outperforms the exsiting algorithms for solving PMS. Besides, the random sampling techniques we employ in our algorithm can also be extended to solve other motif search problems including Simple Motif Search (SMS) and Edit-distance based Motif Search (EMS). Furthermore, our algorithm can be parallelized efficiently and has the potential of yielding great speedups on multi-core machines.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"qPMS10: A randomized algorithm for efficiently solving quorum Planted Motif Search problem\",\"authors\":\"Peng Xiao, Soumitra Pal, S. Rajasekaran\",\"doi\":\"10.1109/BIBM.2016.7822598\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Discovering patterns in biological sequences is very important to extract useful information from them. Motifs are crucial patterns that have numerous applications including the identification of transcription factors and their binding sites, composite regulatory patterns, similiarity between families of proteins, etc. Several models of motifs have been proposed in the literature. The (l, d)-motif model is one of these that has been studied widely. The (l, d)-motif search problem is also known as Planted Motif Search (PMS). The general problem of PMS has been proven to be NP-hard. In this paper, we present an elegant as well as efficient randomized algorithm, named qPMS10, to solve PMS. Currently, the best known algorithm for solving PMS is qPMS9 and it can solve challenging (l, d)-motif instances up to (28, 12) and (30, 13). qPMS9 is a deterministic algorithm. We provide a performance comparison of qPMS10 with qPMS9 on standard benchmark datasets. Both theoretical and empirical analysis demonstrate that our randomized algorithm outperforms the exsiting algorithms for solving PMS. Besides, the random sampling techniques we employ in our algorithm can also be extended to solve other motif search problems including Simple Motif Search (SMS) and Edit-distance based Motif Search (EMS). Furthermore, our algorithm can be parallelized efficiently and has the potential of yielding great speedups on multi-core machines.\",\"PeriodicalId\":345384,\"journal\":{\"name\":\"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBM.2016.7822598\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2016.7822598","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
qPMS10: A randomized algorithm for efficiently solving quorum Planted Motif Search problem
Discovering patterns in biological sequences is very important to extract useful information from them. Motifs are crucial patterns that have numerous applications including the identification of transcription factors and their binding sites, composite regulatory patterns, similiarity between families of proteins, etc. Several models of motifs have been proposed in the literature. The (l, d)-motif model is one of these that has been studied widely. The (l, d)-motif search problem is also known as Planted Motif Search (PMS). The general problem of PMS has been proven to be NP-hard. In this paper, we present an elegant as well as efficient randomized algorithm, named qPMS10, to solve PMS. Currently, the best known algorithm for solving PMS is qPMS9 and it can solve challenging (l, d)-motif instances up to (28, 12) and (30, 13). qPMS9 is a deterministic algorithm. We provide a performance comparison of qPMS10 with qPMS9 on standard benchmark datasets. Both theoretical and empirical analysis demonstrate that our randomized algorithm outperforms the exsiting algorithms for solving PMS. Besides, the random sampling techniques we employ in our algorithm can also be extended to solve other motif search problems including Simple Motif Search (SMS) and Edit-distance based Motif Search (EMS). Furthermore, our algorithm can be parallelized efficiently and has the potential of yielding great speedups on multi-core machines.