基于结构的体外转录因子- dna相互作用预测方法

Zhenzhu Gao, Jianhua Ruan
{"title":"基于结构的体外转录因子- dna相互作用预测方法","authors":"Zhenzhu Gao, Jianhua Ruan","doi":"10.1109/GENSIPS.2013.6735915","DOIUrl":null,"url":null,"abstract":"Summary form only given. Understanding the mechanism of transcriptional regulation remains to be an inspiring stage of molecular biology. Within the popular methods for modeling TFBS, position-specific weight matrix and k-mer based approaches have gained great success. However, both approaches fail to consider the structural properties of a binding site. Recently, a novel TFBS modeling and predicting approach is presented by Bauer et al. (2010), where the sequence-specific chemical and structural features of DNA are applied. However, the in vivo protein-DNA interactions observed in ChIP-chip assays, which were used in this study, are not necessarily direct, as some TFs tend to interact with DNAs extensively through other partners. Therefore, an evaluation on a proper in vitro dataset would be more appropriate to reveal the benefit of such physicochemical features in modeling TF-DNA interactions. Recently, in vitro protein-binding microarray experiment has greatly improved the understanding of transcription factor-DNA interaction. It is a high-throughput experiment used to measure the in vitro binding affinity of a given TF to the sequences on the probe array. Because typical confounding factors such as transcription co-factors present in ChIP-based experiments are eliminated, PBM data provide an excellent information source to develop structural models for TF-DNA interactions. On the other hand, directly mapping of the 3-mer or 4-mer based meta-features to the candidate DNA binding sequences as in their work may not reflect the TF-DNA binding nature, since a TFBS is usually an 8 to 12 base-pair. As a result, conventionally machine-learning algorithms, which rely on well-structured feature vector and label pairs, may not work well in modeling PBM data. In this paper we propose a novel approach to predicts in vitro transcription factor binding based on the structural properties of DNA using a so-called multiple-instance learning algorithm. Compared to conventional (single-instance based) learning algorithms, our multi-instance learning-based algorithm does not require the knowledge of the actual binding site within a candidate probe sequence, yet can still take full advantage of the physicochemical properties in modeling and predicting TF-DNA interactions. Evaluation on an in vitro protein binding microarray data of twenty mouse TFs shows that our new model performs significantly better than several k-mer or structure-based single-instance learning algorithms. It indicates that combining multi-instance learning and structural properties of DNA has promising potential for studying biological regulatory networks.","PeriodicalId":336511,"journal":{"name":"2013 IEEE International Workshop on Genomic Signal Processing and Statistics","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A structure-based approach to predicting in vitro transcription factor-DNA interaction\",\"authors\":\"Zhenzhu Gao, Jianhua Ruan\",\"doi\":\"10.1109/GENSIPS.2013.6735915\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Summary form only given. Understanding the mechanism of transcriptional regulation remains to be an inspiring stage of molecular biology. Within the popular methods for modeling TFBS, position-specific weight matrix and k-mer based approaches have gained great success. However, both approaches fail to consider the structural properties of a binding site. Recently, a novel TFBS modeling and predicting approach is presented by Bauer et al. (2010), where the sequence-specific chemical and structural features of DNA are applied. However, the in vivo protein-DNA interactions observed in ChIP-chip assays, which were used in this study, are not necessarily direct, as some TFs tend to interact with DNAs extensively through other partners. Therefore, an evaluation on a proper in vitro dataset would be more appropriate to reveal the benefit of such physicochemical features in modeling TF-DNA interactions. Recently, in vitro protein-binding microarray experiment has greatly improved the understanding of transcription factor-DNA interaction. It is a high-throughput experiment used to measure the in vitro binding affinity of a given TF to the sequences on the probe array. Because typical confounding factors such as transcription co-factors present in ChIP-based experiments are eliminated, PBM data provide an excellent information source to develop structural models for TF-DNA interactions. On the other hand, directly mapping of the 3-mer or 4-mer based meta-features to the candidate DNA binding sequences as in their work may not reflect the TF-DNA binding nature, since a TFBS is usually an 8 to 12 base-pair. As a result, conventionally machine-learning algorithms, which rely on well-structured feature vector and label pairs, may not work well in modeling PBM data. In this paper we propose a novel approach to predicts in vitro transcription factor binding based on the structural properties of DNA using a so-called multiple-instance learning algorithm. Compared to conventional (single-instance based) learning algorithms, our multi-instance learning-based algorithm does not require the knowledge of the actual binding site within a candidate probe sequence, yet can still take full advantage of the physicochemical properties in modeling and predicting TF-DNA interactions. Evaluation on an in vitro protein binding microarray data of twenty mouse TFs shows that our new model performs significantly better than several k-mer or structure-based single-instance learning algorithms. It indicates that combining multi-instance learning and structural properties of DNA has promising potential for studying biological regulatory networks.\",\"PeriodicalId\":336511,\"journal\":{\"name\":\"2013 IEEE International Workshop on Genomic Signal Processing and Statistics\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE International Workshop on Genomic Signal Processing and Statistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GENSIPS.2013.6735915\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Workshop on Genomic Signal Processing and Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GENSIPS.2013.6735915","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

只提供摘要形式。理解转录调控的机制仍然是分子生物学的一个鼓舞人心的阶段。在目前流行的TFBS建模方法中,位置特定权重矩阵和基于k-mer的方法取得了很大的成功。然而,这两种方法都没有考虑到结合位点的结构特性。最近,Bauer等人(2010)提出了一种新的TFBS建模和预测方法,其中应用了DNA的序列特异性化学和结构特征。然而,在本研究中使用的ChIP-chip试验中观察到的体内蛋白质- dna相互作用并不一定是直接的,因为一些tf倾向于通过其他伙伴广泛地与dna相互作用。因此,对适当的体外数据集进行评估将更适合揭示此类物理化学特征在模拟TF-DNA相互作用中的益处。近年来,体外蛋白结合微阵列实验极大地提高了对转录因子- dna相互作用的认识。这是一种高通量实验,用于测量给定TF与探针阵列上序列的体外结合亲和力。由于消除了基于芯片的实验中存在的转录辅助因子等典型混淆因素,PBM数据为开发TF-DNA相互作用的结构模型提供了极好的信息源。另一方面,直接将3-聚体或4-聚体的元特征映射到候选DNA结合序列可能不能反映TF-DNA结合的性质,因为TFBS通常是8到12个碱基对。因此,传统的机器学习算法依赖于结构良好的特征向量和标签对,可能无法很好地建模PBM数据。在本文中,我们提出了一种新的方法来预测体外转录因子结合基于DNA的结构特性,使用所谓的多实例学习算法。与传统的(基于单实例的)学习算法相比,我们的基于多实例学习的算法不需要了解候选探针序列中实际结合位点的知识,但仍然可以充分利用建模和预测TF-DNA相互作用的物理化学性质。对20个小鼠tf的体外蛋白结合微阵列数据的评估表明,我们的新模型明显优于几种k-mer或基于结构的单实例学习算法。这表明将多实例学习与DNA结构特性相结合在生物调控网络研究中具有广阔的应用前景。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A structure-based approach to predicting in vitro transcription factor-DNA interaction
Summary form only given. Understanding the mechanism of transcriptional regulation remains to be an inspiring stage of molecular biology. Within the popular methods for modeling TFBS, position-specific weight matrix and k-mer based approaches have gained great success. However, both approaches fail to consider the structural properties of a binding site. Recently, a novel TFBS modeling and predicting approach is presented by Bauer et al. (2010), where the sequence-specific chemical and structural features of DNA are applied. However, the in vivo protein-DNA interactions observed in ChIP-chip assays, which were used in this study, are not necessarily direct, as some TFs tend to interact with DNAs extensively through other partners. Therefore, an evaluation on a proper in vitro dataset would be more appropriate to reveal the benefit of such physicochemical features in modeling TF-DNA interactions. Recently, in vitro protein-binding microarray experiment has greatly improved the understanding of transcription factor-DNA interaction. It is a high-throughput experiment used to measure the in vitro binding affinity of a given TF to the sequences on the probe array. Because typical confounding factors such as transcription co-factors present in ChIP-based experiments are eliminated, PBM data provide an excellent information source to develop structural models for TF-DNA interactions. On the other hand, directly mapping of the 3-mer or 4-mer based meta-features to the candidate DNA binding sequences as in their work may not reflect the TF-DNA binding nature, since a TFBS is usually an 8 to 12 base-pair. As a result, conventionally machine-learning algorithms, which rely on well-structured feature vector and label pairs, may not work well in modeling PBM data. In this paper we propose a novel approach to predicts in vitro transcription factor binding based on the structural properties of DNA using a so-called multiple-instance learning algorithm. Compared to conventional (single-instance based) learning algorithms, our multi-instance learning-based algorithm does not require the knowledge of the actual binding site within a candidate probe sequence, yet can still take full advantage of the physicochemical properties in modeling and predicting TF-DNA interactions. Evaluation on an in vitro protein binding microarray data of twenty mouse TFs shows that our new model performs significantly better than several k-mer or structure-based single-instance learning algorithms. It indicates that combining multi-instance learning and structural properties of DNA has promising potential for studying biological regulatory networks.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Compromised intervention policies for phenotype alteration SeqBBS: A change-point model based algorithm and R package for searching CNV regions via the ratio of sequencing reads Optimal Bayesian MMSE estimation of the coefficient of determination for discrete prediction Boolean model to experimental validation: A preliminary attempt Inference of genetic regulatory networks with unknown covariance structure
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1