Discriminative discovery of transcription factor binding sites from location data.

Yuji Kawada, Yasubumi Sakakibara
{"title":"Discriminative discovery of transcription factor binding sites from location data.","authors":"Yuji Kawada,&nbsp;Yasubumi Sakakibara","doi":"10.1109/csb.2005.30","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>The availability of genome-wide location analyses based on chromatin immunoprecipitation (ChIP) data gives a new insight for in silico analysis of transcriptional regulations.</p><p><strong>Results: </strong>We propose a novel discriminative discovery framework for precisely identifying transcriptional regulatory motifs from both positive and negative samples (sets of upstream sequences of both bound and unbound genes by a transcription factor (TF)) based on the genome-wide location data. In this framework, our goal is to find such discriminative motifs that best explain the location data in the sense that the motifs precisely discriminate the positive samples from the negative ones. First, in order to discover an initial set of discriminative substrings between positive and negative samples, we apply a decision tree learning method which produces a text-classification tree. We extract several clusters consisting of similar substrings from the internal nodes of the learned tree. Second, we start with initial profile-HMMs constructed from each cluster for representing putative motifs and iteratively refine the profile-HMMs to improve the discrimination accuracies. Our genome-wide experimental results on yeast show that our method successfully identifies the consensus sequences for known TFs in the literature and further presents significant performances for discriminating between positive and negative samples in all the TFs, while most other motif detecting methods show very poor performances on the problem of discriminations. Our learned profile-HMMs also improve false negative predictions of ChIP data.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"86-9"},"PeriodicalIF":0.0000,"publicationDate":"2005-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2005.30","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE Computational Systems Bioinformatics Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/csb.2005.30","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Motivation: The availability of genome-wide location analyses based on chromatin immunoprecipitation (ChIP) data gives a new insight for in silico analysis of transcriptional regulations.

Results: We propose a novel discriminative discovery framework for precisely identifying transcriptional regulatory motifs from both positive and negative samples (sets of upstream sequences of both bound and unbound genes by a transcription factor (TF)) based on the genome-wide location data. In this framework, our goal is to find such discriminative motifs that best explain the location data in the sense that the motifs precisely discriminate the positive samples from the negative ones. First, in order to discover an initial set of discriminative substrings between positive and negative samples, we apply a decision tree learning method which produces a text-classification tree. We extract several clusters consisting of similar substrings from the internal nodes of the learned tree. Second, we start with initial profile-HMMs constructed from each cluster for representing putative motifs and iteratively refine the profile-HMMs to improve the discrimination accuracies. Our genome-wide experimental results on yeast show that our method successfully identifies the consensus sequences for known TFs in the literature and further presents significant performances for discriminating between positive and negative samples in all the TFs, while most other motif detecting methods show very poor performances on the problem of discriminations. Our learned profile-HMMs also improve false negative predictions of ChIP data.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从定位数据中鉴别发现转录因子结合位点。
动机:基于染色质免疫沉淀(ChIP)数据的全基因组定位分析的可用性为转录调控的计算机分析提供了新的见解。结果:我们提出了一个新的鉴别发现框架,用于基于全基因组定位数据精确识别阳性和阴性样本(转录因子(TF)结合和未结合基因的上游序列集)的转录调控基序。在这个框架中,我们的目标是找到这样的判别基序,在基序精确区分阳性样本和阴性样本的意义上,最好地解释位置数据。首先,为了在正样本和负样本之间发现一组初始的判别子串,我们采用决策树学习方法生成文本分类树。我们从学习树的内部节点中提取由相似子串组成的几个簇。其次,我们从每个聚类构建初始轮廓hmm开始,用于表示假定的基序,并迭代改进轮廓hmm以提高识别精度。我们在酵母上的全基因组实验结果表明,我们的方法成功地识别了文献中已知的tf的共识序列,并进一步在所有tf的阳性和阴性样本区分方面表现出显著的性能,而大多数其他基序检测方法在区分问题上表现得非常差。我们学习的侧写- hmm也改善了ChIP数据的假阴性预测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Tree decomposition based fast search of RNA structures including pseudoknots in genomes. An algebraic geometry approach to protein structure determination from NMR data. A tree-decomposition approach to protein structure prediction. A pivoting algorithm for metabolic networks in the presence of thermodynamic constraints. A topological measurement for weighted protein interaction network.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1