{"title":"基于结构的体外转录因子- dna相互作用预测方法","authors":"Zhenzhu Gao, Jianhua Ruan","doi":"10.1109/GENSIPS.2013.6735915","DOIUrl":null,"url":null,"abstract":"Summary form only given. Understanding the mechanism of transcriptional regulation remains to be an inspiring stage of molecular biology. Within the popular methods for modeling TFBS, position-specific weight matrix and k-mer based approaches have gained great success. However, both approaches fail to consider the structural properties of a binding site. Recently, a novel TFBS modeling and predicting approach is presented by Bauer et al. (2010), where the sequence-specific chemical and structural features of DNA are applied. However, the in vivo protein-DNA interactions observed in ChIP-chip assays, which were used in this study, are not necessarily direct, as some TFs tend to interact with DNAs extensively through other partners. Therefore, an evaluation on a proper in vitro dataset would be more appropriate to reveal the benefit of such physicochemical features in modeling TF-DNA interactions. Recently, in vitro protein-binding microarray experiment has greatly improved the understanding of transcription factor-DNA interaction. It is a high-throughput experiment used to measure the in vitro binding affinity of a given TF to the sequences on the probe array. Because typical confounding factors such as transcription co-factors present in ChIP-based experiments are eliminated, PBM data provide an excellent information source to develop structural models for TF-DNA interactions. On the other hand, directly mapping of the 3-mer or 4-mer based meta-features to the candidate DNA binding sequences as in their work may not reflect the TF-DNA binding nature, since a TFBS is usually an 8 to 12 base-pair. As a result, conventionally machine-learning algorithms, which rely on well-structured feature vector and label pairs, may not work well in modeling PBM data. In this paper we propose a novel approach to predicts in vitro transcription factor binding based on the structural properties of DNA using a so-called multiple-instance learning algorithm. Compared to conventional (single-instance based) learning algorithms, our multi-instance learning-based algorithm does not require the knowledge of the actual binding site within a candidate probe sequence, yet can still take full advantage of the physicochemical properties in modeling and predicting TF-DNA interactions. Evaluation on an in vitro protein binding microarray data of twenty mouse TFs shows that our new model performs significantly better than several k-mer or structure-based single-instance learning algorithms. It indicates that combining multi-instance learning and structural properties of DNA has promising potential for studying biological regulatory networks.","PeriodicalId":336511,"journal":{"name":"2013 IEEE International Workshop on Genomic Signal Processing and Statistics","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A structure-based approach to predicting in vitro transcription factor-DNA interaction\",\"authors\":\"Zhenzhu Gao, Jianhua Ruan\",\"doi\":\"10.1109/GENSIPS.2013.6735915\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Summary form only given. Understanding the mechanism of transcriptional regulation remains to be an inspiring stage of molecular biology. Within the popular methods for modeling TFBS, position-specific weight matrix and k-mer based approaches have gained great success. However, both approaches fail to consider the structural properties of a binding site. Recently, a novel TFBS modeling and predicting approach is presented by Bauer et al. (2010), where the sequence-specific chemical and structural features of DNA are applied. However, the in vivo protein-DNA interactions observed in ChIP-chip assays, which were used in this study, are not necessarily direct, as some TFs tend to interact with DNAs extensively through other partners. Therefore, an evaluation on a proper in vitro dataset would be more appropriate to reveal the benefit of such physicochemical features in modeling TF-DNA interactions. Recently, in vitro protein-binding microarray experiment has greatly improved the understanding of transcription factor-DNA interaction. It is a high-throughput experiment used to measure the in vitro binding affinity of a given TF to the sequences on the probe array. Because typical confounding factors such as transcription co-factors present in ChIP-based experiments are eliminated, PBM data provide an excellent information source to develop structural models for TF-DNA interactions. On the other hand, directly mapping of the 3-mer or 4-mer based meta-features to the candidate DNA binding sequences as in their work may not reflect the TF-DNA binding nature, since a TFBS is usually an 8 to 12 base-pair. As a result, conventionally machine-learning algorithms, which rely on well-structured feature vector and label pairs, may not work well in modeling PBM data. In this paper we propose a novel approach to predicts in vitro transcription factor binding based on the structural properties of DNA using a so-called multiple-instance learning algorithm. Compared to conventional (single-instance based) learning algorithms, our multi-instance learning-based algorithm does not require the knowledge of the actual binding site within a candidate probe sequence, yet can still take full advantage of the physicochemical properties in modeling and predicting TF-DNA interactions. Evaluation on an in vitro protein binding microarray data of twenty mouse TFs shows that our new model performs significantly better than several k-mer or structure-based single-instance learning algorithms. It indicates that combining multi-instance learning and structural properties of DNA has promising potential for studying biological regulatory networks.\",\"PeriodicalId\":336511,\"journal\":{\"name\":\"2013 IEEE International Workshop on Genomic Signal Processing and Statistics\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE International Workshop on Genomic Signal Processing and Statistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GENSIPS.2013.6735915\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Workshop on Genomic Signal Processing and Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GENSIPS.2013.6735915","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A structure-based approach to predicting in vitro transcription factor-DNA interaction
Summary form only given. Understanding the mechanism of transcriptional regulation remains to be an inspiring stage of molecular biology. Within the popular methods for modeling TFBS, position-specific weight matrix and k-mer based approaches have gained great success. However, both approaches fail to consider the structural properties of a binding site. Recently, a novel TFBS modeling and predicting approach is presented by Bauer et al. (2010), where the sequence-specific chemical and structural features of DNA are applied. However, the in vivo protein-DNA interactions observed in ChIP-chip assays, which were used in this study, are not necessarily direct, as some TFs tend to interact with DNAs extensively through other partners. Therefore, an evaluation on a proper in vitro dataset would be more appropriate to reveal the benefit of such physicochemical features in modeling TF-DNA interactions. Recently, in vitro protein-binding microarray experiment has greatly improved the understanding of transcription factor-DNA interaction. It is a high-throughput experiment used to measure the in vitro binding affinity of a given TF to the sequences on the probe array. Because typical confounding factors such as transcription co-factors present in ChIP-based experiments are eliminated, PBM data provide an excellent information source to develop structural models for TF-DNA interactions. On the other hand, directly mapping of the 3-mer or 4-mer based meta-features to the candidate DNA binding sequences as in their work may not reflect the TF-DNA binding nature, since a TFBS is usually an 8 to 12 base-pair. As a result, conventionally machine-learning algorithms, which rely on well-structured feature vector and label pairs, may not work well in modeling PBM data. In this paper we propose a novel approach to predicts in vitro transcription factor binding based on the structural properties of DNA using a so-called multiple-instance learning algorithm. Compared to conventional (single-instance based) learning algorithms, our multi-instance learning-based algorithm does not require the knowledge of the actual binding site within a candidate probe sequence, yet can still take full advantage of the physicochemical properties in modeling and predicting TF-DNA interactions. Evaluation on an in vitro protein binding microarray data of twenty mouse TFs shows that our new model performs significantly better than several k-mer or structure-based single-instance learning algorithms. It indicates that combining multi-instance learning and structural properties of DNA has promising potential for studying biological regulatory networks.