寻找核苷酸相邻依赖DNA基序的最优算法

Francis Y. L. Chin, Henry C. M. Leung, M. Siu, S. Yiu
{"title":"寻找核苷酸相邻依赖DNA基序的最优算法","authors":"Francis Y. L. Chin, Henry C. M. Leung, M. Siu, S. Yiu","doi":"10.1142/9781848161092_0035","DOIUrl":null,"url":null,"abstract":"Finding motifs and the corresponding binding sites is a critical and challenging problem in studying the process of gene expression. String and matrix representations are two popular models to represent a motif. However, both representations share an important weakness by assuming that the occurrence of a nucleotide in a binding site is independent of other nucleotides. More complicated representations, such as HMM or regular expression, exist that can capture the nucleotide dependency. Unfortunately, these models are not practical (with too many parameters and require many known binding sites). Recently, Chin and h u n g introduced the SPSP representation which overcomes the limitations of these complicated models. However, discovering novel motifs in SPSP representation is still a NP-hard problem. In this paper, based on our observations in real binding sites, we propose a simpler model, the Dependency Pattern Sets (DPS) representation, which is simpler than the SPSP model but can still capture the nucleotide dependency. We develop a branch and bound algorithm (DPS-Finder) for finding optimal DPS motifs. Experimental results show that DPS-Finder can discover a length-10 motif from 22 length500 DNA sequences within a few minutes and the DPS representation has a similar performance as SPSP representation.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"24 1","pages":"343-352"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Optimal Algorithm for Finding DNA Motifs with Nucleotide Adjacent Dependency\",\"authors\":\"Francis Y. L. Chin, Henry C. M. Leung, M. Siu, S. Yiu\",\"doi\":\"10.1142/9781848161092_0035\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Finding motifs and the corresponding binding sites is a critical and challenging problem in studying the process of gene expression. String and matrix representations are two popular models to represent a motif. However, both representations share an important weakness by assuming that the occurrence of a nucleotide in a binding site is independent of other nucleotides. More complicated representations, such as HMM or regular expression, exist that can capture the nucleotide dependency. Unfortunately, these models are not practical (with too many parameters and require many known binding sites). Recently, Chin and h u n g introduced the SPSP representation which overcomes the limitations of these complicated models. However, discovering novel motifs in SPSP representation is still a NP-hard problem. In this paper, based on our observations in real binding sites, we propose a simpler model, the Dependency Pattern Sets (DPS) representation, which is simpler than the SPSP model but can still capture the nucleotide dependency. We develop a branch and bound algorithm (DPS-Finder) for finding optimal DPS motifs. Experimental results show that DPS-Finder can discover a length-10 motif from 22 length500 DNA sequences within a few minutes and the DPS representation has a similar performance as SPSP representation.\",\"PeriodicalId\":74513,\"journal\":{\"name\":\"Proceedings of the ... Asia-Pacific bioinformatics conference\",\"volume\":\"24 1\",\"pages\":\"343-352\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... Asia-Pacific bioinformatics conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/9781848161092_0035\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... Asia-Pacific bioinformatics conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/9781848161092_0035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

寻找基序和相应的结合位点是研究基因表达过程的一个关键和具有挑战性的问题。字符串表示和矩阵表示是两种常用的表示图案的模型。然而,这两种表述都有一个重要的弱点,即假设一个核苷酸在结合位点的发生与其他核苷酸无关。存在更复杂的表示,如HMM或正则表达式,可以捕获核苷酸依赖性。不幸的是,这些模型并不实用(参数太多,需要许多已知的结合位点)。最近,Chin和hu引入了SPSP表示,克服了这些复杂模型的局限性。然而,在SPSP表示中发现新的基序仍然是一个np难题。在本文中,基于我们对实际结合位点的观察,我们提出了一个更简单的模型,即依赖模式集(Dependency Pattern Sets, DPS)表示,它比SPSP模型更简单,但仍然可以捕获核苷酸依赖性。我们开发了一个分支定界算法(DPS- finder)来寻找最优的DPS基序。实验结果表明,DPS- finder可以在几分钟内从22个长度为500的DNA序列中发现长度为10的基序,DPS表示与SPSP表示具有相似的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Optimal Algorithm for Finding DNA Motifs with Nucleotide Adjacent Dependency
Finding motifs and the corresponding binding sites is a critical and challenging problem in studying the process of gene expression. String and matrix representations are two popular models to represent a motif. However, both representations share an important weakness by assuming that the occurrence of a nucleotide in a binding site is independent of other nucleotides. More complicated representations, such as HMM or regular expression, exist that can capture the nucleotide dependency. Unfortunately, these models are not practical (with too many parameters and require many known binding sites). Recently, Chin and h u n g introduced the SPSP representation which overcomes the limitations of these complicated models. However, discovering novel motifs in SPSP representation is still a NP-hard problem. In this paper, based on our observations in real binding sites, we propose a simpler model, the Dependency Pattern Sets (DPS) representation, which is simpler than the SPSP model but can still capture the nucleotide dependency. We develop a branch and bound algorithm (DPS-Finder) for finding optimal DPS motifs. Experimental results show that DPS-Finder can discover a length-10 motif from 22 length500 DNA sequences within a few minutes and the DPS representation has a similar performance as SPSP representation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Tuning Privacy-Utility Tradeoff in Genomic Studies Using Selective SNP Hiding. The Future of Bioinformatics CHEMICAL COMPOUND CLASSIFICATION WITH AUTOMATICALLY MINED STRUCTURE PATTERNS. Predicting Nucleolar Proteins Using Support-Vector Machines Proceedings of the 6th Asia-Pacific Bioinformatics Conference, APBC 2008, 14-17 January 2008, Kyoto, Japan
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1