Expectation Maximization based algorithm applied to DNA sequence motif finder

J. C. Garbelini, D. Sanches, A. Pozo
{"title":"Expectation Maximization based algorithm applied to DNA sequence motif finder","authors":"J. C. Garbelini, D. Sanches, A. Pozo","doi":"10.1109/CEC55065.2022.9870303","DOIUrl":null,"url":null,"abstract":"Finding transcription factor binding sites plays an important role inside bioinformatics. Its correct identification in the promoter regions of co-expressed genes is a crucial step for understanding gene expression mechanisms and creating new drugs and vaccines. The problem of finding motifs consists in seeking conserved patterns in biological datasets of sequences, through using unsupervised learning algorithms. This problem is considered one of the open problems of computational biology, which in its simplest formulation has been proven to be np-hard. Moreover, heuristics and meta-heuristics algorithms have been shown to be very promising in solving combinatorial problems with very large search spaces. In this paper we propose a new algorithm called Biomapp (Biological Motif Application) based on canonical Expectation Maximization that uses the Kullback-Leibler divergence to re-estimate the parameters of statistical model. Furthermore, the algorithm is embedded in an Iterated Local Search, as the local search step and then, we use a hierarchical perturbation operator in order to escape from local optima. The results obtained by this new approach were compared with the state-of-the-art algorithm MEME (Multiple EM Motif Elicitation) showing that Biomapp outperformed this classical technique in several datasets.","PeriodicalId":153241,"journal":{"name":"2022 IEEE Congress on Evolutionary Computation (CEC)","volume":"138 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Congress on Evolutionary Computation (CEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEC55065.2022.9870303","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Finding transcription factor binding sites plays an important role inside bioinformatics. Its correct identification in the promoter regions of co-expressed genes is a crucial step for understanding gene expression mechanisms and creating new drugs and vaccines. The problem of finding motifs consists in seeking conserved patterns in biological datasets of sequences, through using unsupervised learning algorithms. This problem is considered one of the open problems of computational biology, which in its simplest formulation has been proven to be np-hard. Moreover, heuristics and meta-heuristics algorithms have been shown to be very promising in solving combinatorial problems with very large search spaces. In this paper we propose a new algorithm called Biomapp (Biological Motif Application) based on canonical Expectation Maximization that uses the Kullback-Leibler divergence to re-estimate the parameters of statistical model. Furthermore, the algorithm is embedded in an Iterated Local Search, as the local search step and then, we use a hierarchical perturbation operator in order to escape from local optima. The results obtained by this new approach were compared with the state-of-the-art algorithm MEME (Multiple EM Motif Elicitation) showing that Biomapp outperformed this classical technique in several datasets.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于期望最大化的DNA序列基序查找算法
寻找转录因子结合位点在生物信息学中起着重要的作用。在共表达基因的启动子区域正确识别它是理解基因表达机制和创造新药和疫苗的关键一步。寻找基序的问题在于通过使用无监督学习算法在序列的生物数据集中寻找保守模式。这个问题被认为是计算生物学的开放问题之一,其最简单的表述已被证明是np困难的。此外,启发式和元启发式算法已被证明在解决具有非常大搜索空间的组合问题方面非常有前途。本文提出了一种基于典型期望最大化的新算法Biomapp (Biological Motif Application),该算法利用Kullback-Leibler散度对统计模型的参数进行重新估计。此外,将算法嵌入到迭代局部搜索中,作为局部搜索步骤,然后使用层次摄动算子来避免局部最优。通过这种新方法获得的结果与最先进的算法MEME (Multiple EM Motif Elicitation)进行了比较,表明Biomapp在几个数据集中优于这种经典技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Impacts of Single-objective Landscapes on Multi-objective Optimization Cooperative Multi-objective Topology Optimization Using Clustering and Metamodeling Global and Local Area Coverage Path Planner for a Reconfigurable Robot A New Integer Linear Program and A Grouping Genetic Algorithm with Controlled Gene Transmission for Joint Order Batching and Picking Routing Problem Test Case Prioritization and Reduction Using Hybrid Quantum-behaved Particle Swarm Optimization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1