利用塌陷变分潜狄利克雷分配从文献中预测蛋白质与蛋白质的关系

Data and Text Mining in Bioinformatics Pub Date : 2008-10-30 DOI:10.1145/1458449.1458467

Tatsuya Asou, K. Eguchi

{"title":"利用塌陷变分潜狄利克雷分配从文献中预测蛋白质与蛋白质的关系","authors":"Tatsuya Asou, K. Eguchi","doi":"10.1145/1458449.1458467","DOIUrl":null,"url":null,"abstract":"This paper investigates applying statistical topic models to extract and predict relationships between biological entities, especially protein mentions. A statistical topic model, Latent Dirichlet Allocation (LDA) is promising; however, it has not been investigated for such a task. In this paper, we apply the state-of-the-art Collapsed Variational Bayesian Inference and Gibbs Sampling inference to estimating the LDA model, and compared them from the viewpoints of log-likelihoods, classification accuracy and retrieval effectiveness. We demonstrate through experiments that the Collapsed Variational LDA gives better results than the other, especially in terms of classification accuracy and retrieval effectiveness in the task of the protein-protein relationship prediction.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"127 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Predicting protein-protein relationships from literature using collapsed variational latent dirichlet allocation\",\"authors\":\"Tatsuya Asou, K. Eguchi\",\"doi\":\"10.1145/1458449.1458467\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper investigates applying statistical topic models to extract and predict relationships between biological entities, especially protein mentions. A statistical topic model, Latent Dirichlet Allocation (LDA) is promising; however, it has not been investigated for such a task. In this paper, we apply the state-of-the-art Collapsed Variational Bayesian Inference and Gibbs Sampling inference to estimating the LDA model, and compared them from the viewpoints of log-likelihoods, classification accuracy and retrieval effectiveness. We demonstrate through experiments that the Collapsed Variational LDA gives better results than the other, especially in terms of classification accuracy and retrieval effectiveness in the task of the protein-protein relationship prediction.\",\"PeriodicalId\":143937,\"journal\":{\"name\":\"Data and Text Mining in Bioinformatics\",\"volume\":\"127 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data and Text Mining in Bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1458449.1458467\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data and Text Mining in Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1458449.1458467","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

本文研究了应用统计主题模型来提取和预测生物实体之间的关系，特别是蛋白质提及。潜在狄利克雷分配(Latent Dirichlet Allocation, LDA)是一种很有前途的统计主题模型;然而，它还没有被研究用于这样的任务。本文将最先进的崩溃变分贝叶斯推理和吉布斯抽样推理应用于LDA模型的估计，并从对数似然、分类准确率和检索效率三个方面对它们进行了比较。通过实验证明，在蛋白质-蛋白质关系预测任务中，崩塌变分LDA在分类精度和检索效率方面优于其他方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Predicting protein-protein relationships from literature using collapsed variational latent dirichlet allocation

This paper investigates applying statistical topic models to extract and predict relationships between biological entities, especially protein mentions. A statistical topic model, Latent Dirichlet Allocation (LDA) is promising; however, it has not been investigated for such a task. In this paper, we apply the state-of-the-art Collapsed Variational Bayesian Inference and Gibbs Sampling inference to estimating the LDA model, and compared them from the viewpoints of log-likelihoods, classification accuracy and retrieval effectiveness. We demonstrate through experiments that the Collapsed Variational LDA gives better results than the other, especially in terms of classification accuracy and retrieval effectiveness in the task of the protein-protein relationship prediction.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Data and Text Mining in Bioinformatics

自引率

0.00%

发文量