生物医学关联挖掘与验证

Q2 Medicine In Silico Biology Pub Date : 2010-02-15 DOI:10.1145/1722024.1722035
P. Gandra, M. Pradhan, M. Palakal
{"title":"生物医学关联挖掘与验证","authors":"P. Gandra, M. Pradhan, M. Palakal","doi":"10.1145/1722024.1722035","DOIUrl":null,"url":null,"abstract":"During last decade, the data published in biomedical literature has increased exponentially. With this growth, it has become hard to manually read all the papers for required information. Many text mining algorithms and approaches have been developed to extract information from the existing literature. One such important information is to find the associations between functional terms like genes, proteins, drugs, diseases etc. These associations can be casual, explicit or implicit. One of the most common applications is to mine protein-protein interactions from Pubmed. The focus of the present study is to identify and validate implicit protein -- protein associations as these are hard to identify from literature. These associations, when detected automatically, are noisy and need to be validated for their biological significance. In the process of validating, these associations were passed through series of filters and an algorithm to remove the noise present in the data. In this study, we used 16 gene ids to retrieve 32,693 documents with 193,738 sentences related to regenerative biology from the Pubmed database. From these sentences, BioMap found 10004 explicit and 30,000 implicit protein interaction pairs that were validated using the proposed methodology. Finally 308 implicit pairs were identified as outcome of this methodology. These results indicate that the proposed methods can be effectively used for biological verification of implicit protein-protein interactions that are obtained through literature mining.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"1 1","pages":"9"},"PeriodicalIF":0.0000,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722035","citationCount":"2","resultStr":"{\"title\":\"Biomedical association mining and validation\",\"authors\":\"P. Gandra, M. Pradhan, M. Palakal\",\"doi\":\"10.1145/1722024.1722035\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"During last decade, the data published in biomedical literature has increased exponentially. With this growth, it has become hard to manually read all the papers for required information. Many text mining algorithms and approaches have been developed to extract information from the existing literature. One such important information is to find the associations between functional terms like genes, proteins, drugs, diseases etc. These associations can be casual, explicit or implicit. One of the most common applications is to mine protein-protein interactions from Pubmed. The focus of the present study is to identify and validate implicit protein -- protein associations as these are hard to identify from literature. These associations, when detected automatically, are noisy and need to be validated for their biological significance. In the process of validating, these associations were passed through series of filters and an algorithm to remove the noise present in the data. In this study, we used 16 gene ids to retrieve 32,693 documents with 193,738 sentences related to regenerative biology from the Pubmed database. From these sentences, BioMap found 10004 explicit and 30,000 implicit protein interaction pairs that were validated using the proposed methodology. Finally 308 implicit pairs were identified as outcome of this methodology. These results indicate that the proposed methods can be effectively used for biological verification of implicit protein-protein interactions that are obtained through literature mining.\",\"PeriodicalId\":39379,\"journal\":{\"name\":\"In Silico Biology\",\"volume\":\"1 1\",\"pages\":\"9\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1145/1722024.1722035\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"In Silico Biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1722024.1722035\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"In Silico Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1722024.1722035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 2

摘要

在过去十年中,生物医学文献中发表的数据呈指数级增长。随着这种增长,手动阅读所有论文以获取所需信息变得很困难。已经开发了许多文本挖掘算法和方法来从现有文献中提取信息。其中一个重要的信息是发现功能术语之间的联系,如基因、蛋白质、药物、疾病等。这些联系可以是随意的、明确的或隐含的。最常见的应用之一是从Pubmed中挖掘蛋白质之间的相互作用。目前研究的重点是识别和验证隐性蛋白质-蛋白质关联,因为这些很难从文献中识别。当自动检测到这些关联时,它们是嘈杂的,需要验证其生物学意义。在验证过程中,这些关联通过一系列过滤器和算法来去除数据中存在的噪声。在这项研究中,我们使用16个基因id从Pubmed数据库中检索到与再生生物学相关的32,693篇文献,193,738个句子。从这些句子中,BioMap发现了10004显式和30,000隐式蛋白质相互作用对,使用所提出的方法进行了验证。最后确定了308个隐式对作为该方法的结果。这些结果表明,所提出的方法可以有效地用于通过文献挖掘获得的隐式蛋白质-蛋白质相互作用的生物学验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Biomedical association mining and validation
During last decade, the data published in biomedical literature has increased exponentially. With this growth, it has become hard to manually read all the papers for required information. Many text mining algorithms and approaches have been developed to extract information from the existing literature. One such important information is to find the associations between functional terms like genes, proteins, drugs, diseases etc. These associations can be casual, explicit or implicit. One of the most common applications is to mine protein-protein interactions from Pubmed. The focus of the present study is to identify and validate implicit protein -- protein associations as these are hard to identify from literature. These associations, when detected automatically, are noisy and need to be validated for their biological significance. In the process of validating, these associations were passed through series of filters and an algorithm to remove the noise present in the data. In this study, we used 16 gene ids to retrieve 32,693 documents with 193,738 sentences related to regenerative biology from the Pubmed database. From these sentences, BioMap found 10004 explicit and 30,000 implicit protein interaction pairs that were validated using the proposed methodology. Finally 308 implicit pairs were identified as outcome of this methodology. These results indicate that the proposed methods can be effectively used for biological verification of implicit protein-protein interactions that are obtained through literature mining.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
In Silico Biology
In Silico Biology Computer Science-Computational Theory and Mathematics
CiteScore
2.20
自引率
0.00%
发文量
1
期刊介绍: The considerable "algorithmic complexity" of biological systems requires a huge amount of detailed information for their complete description. Although far from being complete, the overwhelming quantity of small pieces of information gathered for all kind of biological systems at the molecular and cellular level requires computational tools to be adequately stored and interpreted. Interpretation of data means to abstract them as much as allowed to provide a systematic, an integrative view of biology. Most of the presently available scientific journals focus either on accumulating more data from elaborate experimental approaches, or on presenting new algorithms for the interpretation of these data. Both approaches are meritorious.
期刊最新文献
Modelling speciation: Problems and implications. Where Do CABs Exist? Verification of a specific region containing concave Actin Bundles (CABs) in a 3-Dimensional confocal image. scAN1.0: A reproducible and standardized pipeline for processing 10X single cell RNAseq data. Modeling and characterization of inter-individual variability in CD8 T cell responses in mice. Cancer immunoediting: A game theoretical approach.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1