qPMS10: A randomized algorithm for efficiently solving quorum Planted Motif Search problem

Peng Xiao, Soumitra Pal, S. Rajasekaran
{"title":"qPMS10: A randomized algorithm for efficiently solving quorum Planted Motif Search problem","authors":"Peng Xiao, Soumitra Pal, S. Rajasekaran","doi":"10.1109/BIBM.2016.7822598","DOIUrl":null,"url":null,"abstract":"Discovering patterns in biological sequences is very important to extract useful information from them. Motifs are crucial patterns that have numerous applications including the identification of transcription factors and their binding sites, composite regulatory patterns, similiarity between families of proteins, etc. Several models of motifs have been proposed in the literature. The (l, d)-motif model is one of these that has been studied widely. The (l, d)-motif search problem is also known as Planted Motif Search (PMS). The general problem of PMS has been proven to be NP-hard. In this paper, we present an elegant as well as efficient randomized algorithm, named qPMS10, to solve PMS. Currently, the best known algorithm for solving PMS is qPMS9 and it can solve challenging (l, d)-motif instances up to (28, 12) and (30, 13). qPMS9 is a deterministic algorithm. We provide a performance comparison of qPMS10 with qPMS9 on standard benchmark datasets. Both theoretical and empirical analysis demonstrate that our randomized algorithm outperforms the exsiting algorithms for solving PMS. Besides, the random sampling techniques we employ in our algorithm can also be extended to solve other motif search problems including Simple Motif Search (SMS) and Edit-distance based Motif Search (EMS). Furthermore, our algorithm can be parallelized efficiently and has the potential of yielding great speedups on multi-core machines.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2016.7822598","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

Discovering patterns in biological sequences is very important to extract useful information from them. Motifs are crucial patterns that have numerous applications including the identification of transcription factors and their binding sites, composite regulatory patterns, similiarity between families of proteins, etc. Several models of motifs have been proposed in the literature. The (l, d)-motif model is one of these that has been studied widely. The (l, d)-motif search problem is also known as Planted Motif Search (PMS). The general problem of PMS has been proven to be NP-hard. In this paper, we present an elegant as well as efficient randomized algorithm, named qPMS10, to solve PMS. Currently, the best known algorithm for solving PMS is qPMS9 and it can solve challenging (l, d)-motif instances up to (28, 12) and (30, 13). qPMS9 is a deterministic algorithm. We provide a performance comparison of qPMS10 with qPMS9 on standard benchmark datasets. Both theoretical and empirical analysis demonstrate that our randomized algorithm outperforms the exsiting algorithms for solving PMS. Besides, the random sampling techniques we employ in our algorithm can also be extended to solve other motif search problems including Simple Motif Search (SMS) and Edit-distance based Motif Search (EMS). Furthermore, our algorithm can be parallelized efficiently and has the potential of yielding great speedups on multi-core machines.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
qPMS10:一种高效解决quorum种植Motif搜索问题的随机化算法
发现生物序列中的模式对于从中提取有用的信息非常重要。基序是至关重要的模式,有许多应用,包括转录因子及其结合位点的鉴定,复合调控模式,蛋白质家族之间的相似性等。文献中提出了几种母题模型。(l, d)基序模型是其中一个被广泛研究的模型。(l, d)-motif搜索问题也被称为植入Motif搜索(PMS)。经前症候群的一般问题已被证明是np困难的。在本文中,我们提出了一个优雅而高效的随机化算法qPMS10来解决PMS问题。目前,最著名的求解PMS的算法是qPMS9,它可以求解(28,12)和(30,13)具有挑战性的(1,d)-motif实例。qPMS9是一种确定性算法。我们提供了qPMS10和qPMS9在标准基准数据集上的性能比较。理论分析和实证分析都表明,我们的随机化算法在解决PMS问题上优于现有的算法。此外,我们采用的随机抽样技术也可以扩展到其他motif搜索问题,包括简单motif搜索(SMS)和基于编辑距离的motif搜索(EMS)。此外,我们的算法可以有效地并行化,并有可能在多核机器上产生巨大的加速。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
The role of high performance, grid and cloud computing in high-throughput sequencing A novel algorithm for identifying essential proteins by integrating subcellular localization CNNsite: Prediction of DNA-binding residues in proteins using Convolutional Neural Network with sequence features Inferring Social Influence of anti-Tobacco mass media campaigns Emotion recognition from multi-channel EEG data through Convolutional Recurrent Neural Network
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1