利用基因表达和蛋白质类别寻找生物活性代谢途径的马尔可夫模型的层次混合。

Proceedings. IEEE Computational Systems Bioinformatics Conference Pub Date : 2004-01-01 DOI:10.1109/csb.2004.1332447

Hiroshi Mamitsuka, Yasushi Okuno

{"title":"利用基因表达和蛋白质类别寻找生物活性代谢途径的马尔可夫模型的层次混合。","authors":"Hiroshi Mamitsuka, Yasushi Okuno","doi":"10.1109/csb.2004.1332447","DOIUrl":null,"url":null,"abstract":"With the recent development of experimental high-throughput techniques, the type and volume of accumulating biological data have extremely increased these few years. Mining from different types of data might lead us to find new biological insights. We present a new methodology for systematically combining three different datasets to find biologically active metabolic paths/patterns. This method consists of two steps: First it synthesizes metabolic paths from a given set of chemical reactions, which are already known and whose enzymes are co-expressed, in an efficient manner. It then represents the obtained metabolic paths in a more comprehensible way through estimating parameters of a probabilistic model by using these synthesized paths. This model is built upon an assumption that an entire set of chemical reactions corresponds to a Markov state transition diagram. Furthermore, this model is a hierarchical latent variable model, containing a set of protein classes as a latent variable, for clustering input paths in terms of existing knowledge of protein classes. We tested the performance of our method using a main pathway of glycolysis, and found that our method achieved higher predictive performance for the issue of classifying gene expressions than those obtained by other unsupervised methods. We further analyzed the estimated parameters of our probabilistic models, and found that biologically active paths were clustered into only two or three patterns for each expression experiment type, and each pattern suggested some new long-range relations in the glycolysis pathway.","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"341-52"},"PeriodicalIF":0.0000,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332447","citationCount":"0","resultStr":"{\"title\":\"A hierarchical mixture of Markov models for finding biologically active metabolic paths using gene expression and protein classes.\",\"authors\":\"Hiroshi Mamitsuka, Yasushi Okuno\",\"doi\":\"10.1109/csb.2004.1332447\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the recent development of experimental high-throughput techniques, the type and volume of accumulating biological data have extremely increased these few years. Mining from different types of data might lead us to find new biological insights. We present a new methodology for systematically combining three different datasets to find biologically active metabolic paths/patterns. This method consists of two steps: First it synthesizes metabolic paths from a given set of chemical reactions, which are already known and whose enzymes are co-expressed, in an efficient manner. It then represents the obtained metabolic paths in a more comprehensible way through estimating parameters of a probabilistic model by using these synthesized paths. This model is built upon an assumption that an entire set of chemical reactions corresponds to a Markov state transition diagram. Furthermore, this model is a hierarchical latent variable model, containing a set of protein classes as a latent variable, for clustering input paths in terms of existing knowledge of protein classes. We tested the performance of our method using a main pathway of glycolysis, and found that our method achieved higher predictive performance for the issue of classifying gene expressions than those obtained by other unsupervised methods. We further analyzed the estimated parameters of our probabilistic models, and found that biologically active paths were clustered into only two or three patterns for each expression experiment type, and each pattern suggested some new long-range relations in the glycolysis pathway.\",\"PeriodicalId\":87417,\"journal\":{\"name\":\"Proceedings. IEEE Computational Systems Bioinformatics Conference\",\"volume\":\" \",\"pages\":\"341-52\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1109/csb.2004.1332447\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. IEEE Computational Systems Bioinformatics Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/csb.2004.1332447\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE Computational Systems Bioinformatics Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/csb.2004.1332447","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

近年来，随着实验高通量技术的发展，积累生物学数据的种类和数量急剧增加。从不同类型的数据中挖掘可能会让我们找到新的生物学见解。我们提出了一种新的方法，系统地结合三种不同的数据集来寻找生物活性代谢途径/模式。该方法包括两个步骤:首先，它以一种有效的方式，从一组已知的化学反应中合成代谢途径，这些化学反应的酶是共同表达的。然后通过使用这些合成路径估计概率模型的参数，以更易于理解的方式表示所获得的代谢路径。这个模型建立在一个假设之上，即一整套化学反应对应于一个马尔可夫状态转换图。此外，该模型是一个分层潜变量模型，包含一组蛋白质类别作为潜变量，用于根据现有蛋白质类别知识对输入路径进行聚类。我们使用糖酵解的主要途径测试了我们的方法的性能，发现我们的方法在基因表达分类问题上取得了比其他无监督方法更高的预测性能。我们进一步分析了概率模型的估计参数，发现每种表达实验类型的生物活性路径仅聚为两种或三种模式，每种模式都表明糖酵解途径中存在一些新的远程关系。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A hierarchical mixture of Markov models for finding biologically active metabolic paths using gene expression and protein classes.

With the recent development of experimental high-throughput techniques, the type and volume of accumulating biological data have extremely increased these few years. Mining from different types of data might lead us to find new biological insights. We present a new methodology for systematically combining three different datasets to find biologically active metabolic paths/patterns. This method consists of two steps: First it synthesizes metabolic paths from a given set of chemical reactions, which are already known and whose enzymes are co-expressed, in an efficient manner. It then represents the obtained metabolic paths in a more comprehensible way through estimating parameters of a probabilistic model by using these synthesized paths. This model is built upon an assumption that an entire set of chemical reactions corresponds to a Markov state transition diagram. Furthermore, this model is a hierarchical latent variable model, containing a set of protein classes as a latent variable, for clustering input paths in terms of existing knowledge of protein classes. We tested the performance of our method using a main pathway of glycolysis, and found that our method achieved higher predictive performance for the issue of classifying gene expressions than those obtained by other unsupervised methods. We further analyzed the estimated parameters of our probabilistic models, and found that biologically active paths were clustered into only two or three patterns for each expression experiment type, and each pattern suggested some new long-range relations in the glycolysis pathway.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings. IEEE Computational Systems Bioinformatics Conference

自引率

0.00%

发文量