Iterative sequential Monte Carlo algorithm for motif discovery

M. Bataineh, Z. Al-qudah, A. Al-Zaben
{"title":"Iterative sequential Monte Carlo algorithm for motif discovery","authors":"M. Bataineh, Z. Al-qudah, A. Al-Zaben","doi":"10.1049/iet-spr.2014.0356","DOIUrl":null,"url":null,"abstract":"The discovery of motifs in transcription factor binding sites is important in the transcription process, and is crucial for understanding the gene regulatory relationship and evolution history. Identifying weak motifs and reducing the effect of local optima, error propagation and computational complexity are still important, but challenging tasks for motif discovery. This study proposes an iterative sequential Monte Carlo (ISMC) motif discovery algorithm based on the position weight matrix and the Gibbs sampling model to locate conserved motifs in a given set of nucleotide sequences. Three sub-algorithms have been proposed. Algorithm 1 (see Fig. 1) deals with the case of one motif instance of fixed length in each nucleotide sequence. Furthermore, the proposed ISMC algorithm is extended to deal with more complex situations including unique motif of unknown length in Algorithm 2, unique motif with unknown abundance in Algorithm 3 (see Fig. 2) and multiple motifs. Experimental results over both synthetic and real datasets show that the proposed ISMC algorithm outperforms five other widely used motif discovery algorithms in terms of nucleotide and site-level sensitivity, nucleotide and site-level positive prediction value, nucleotide-level performance coefficient, nucleotide-level correlation coefficient and site-level average site performance.","PeriodicalId":272888,"journal":{"name":"IET Signal Process.","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Signal Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1049/iet-spr.2014.0356","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The discovery of motifs in transcription factor binding sites is important in the transcription process, and is crucial for understanding the gene regulatory relationship and evolution history. Identifying weak motifs and reducing the effect of local optima, error propagation and computational complexity are still important, but challenging tasks for motif discovery. This study proposes an iterative sequential Monte Carlo (ISMC) motif discovery algorithm based on the position weight matrix and the Gibbs sampling model to locate conserved motifs in a given set of nucleotide sequences. Three sub-algorithms have been proposed. Algorithm 1 (see Fig. 1) deals with the case of one motif instance of fixed length in each nucleotide sequence. Furthermore, the proposed ISMC algorithm is extended to deal with more complex situations including unique motif of unknown length in Algorithm 2, unique motif with unknown abundance in Algorithm 3 (see Fig. 2) and multiple motifs. Experimental results over both synthetic and real datasets show that the proposed ISMC algorithm outperforms five other widely used motif discovery algorithms in terms of nucleotide and site-level sensitivity, nucleotide and site-level positive prediction value, nucleotide-level performance coefficient, nucleotide-level correlation coefficient and site-level average site performance.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基序发现的迭代序贯蒙特卡罗算法
转录因子结合位点基序的发现在转录过程中具有重要意义,对理解基因调控关系和进化历史具有重要意义。识别弱基序并降低局部最优的影响、误差传播和计算复杂度仍然是基序发现的重要但具有挑战性的任务。本研究提出了一种基于位置权重矩阵和Gibbs采样模型的迭代序列蒙特卡罗(ISMC)基序发现算法,用于在给定的核苷酸序列中定位保守基序。提出了三个子算法。算法1(见图1)处理每个核苷酸序列中固定长度的一个基序实例的情况。在此基础上,对ISMC算法进行了扩展,可以处理算法2中长度未知的唯一基序、算法3中丰度未知的唯一基序(见图2)以及多个基序等更复杂的情况。在合成和真实数据集上的实验结果表明,所提出的ISMC算法在核苷酸和位点级灵敏度、核苷酸和位点级阳性预测值、核苷酸级性能系数、核苷酸级相关系数和位点级平均位点性能方面优于其他五种广泛使用的基序发现算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
An order insensitive optimal generalised sequential fusion estimation for stochastic uncertain multi-sensor systems with correlated noise Spatial Multiplexing in Near Field MIMO Channels with Reconfigurable Intelligent Surfaces An improved segmentation technique for multilevel thresholding of crop image using cuckoo search algorithm based on recursive minimum cross entropy Advances in image processing using machine learning techniques An unsupervised monocular image depth prediction algorithm using Fourier domain analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1