Distant Supervision-based Relation Extraction for Literature-Related Biomedical Knowledge Graph Construction

Rui Hua, Zixin Shu, Dengying Yan, Kuo Yang, Xinyan Wang, Chuang Cheng, Xuezhong Zhou, Qiang Zhu
{"title":"Distant Supervision-based Relation Extraction for Literature-Related Biomedical Knowledge Graph Construction","authors":"Rui Hua, Zixin Shu, Dengying Yan, Kuo Yang, Xinyan Wang, Chuang Cheng, Xuezhong Zhou, Qiang Zhu","doi":"10.2174/0122102981269053230921074451","DOIUrl":null,"url":null,"abstract":"Background:: The task of relation extraction is a crucial component in the construction of a knowledge graph. However, it often necessitates a significant amount of manual annotation, which can be time-consuming and expensive. Distant supervision, as a technique, seeks to mitigate this challenge by generating a large volume of pseudo-training data at a minimal cost, achieved by mapping triple facts onto the raw text. Objective:: The aim of this study is to explore the novelty and potential of the distant supervisionbased relation extraction approach. By leveraging this innovative method, we aim to enhance knowledge reliability and facilitate new knowledge discovery, establishing associations between knowledge from specific biomedical data or existing knowledge graphs and literature. Method:: This study presents a methodology to construct a biomedical knowledge graph employing distant supervision techniques. Through establishing links between knowledge entities and relevant literature sources, we methodically extract and integrate information, thereby expanding and enriching the knowledge graph. This study identified five types of biomedical entities (e.g., diseases, symptoms and genes) and four kinds of relationships. These were linked to PubMed literature and divided into training and testing datasets. To mitigate data noise, the training set underwent preprocessing, while the testing set was manually curated. method: This study introduces a methodology for constructing a biomedical knowledge graph using distant supervision techniques. By establishing connections between knowledge entities and relevant literature sources, we systematically extract and integrate information to expand and enrich the knowledge graph. Results:: In our research, we successfully associated 230,698 triples from the existing knowledge graph with relevant literature. Furthermore, we identified additional 205,148 new triples directly sourced from these studies. Conclusion:: Our study markedly advances the field of biomedical knowledge graph enrichment, particularly in the context of Traditional Chinese Medicine (TCM). By validating a substantial number of triples through literature associations and uncovering over 200,000 new triples, we have made a significant stride in promoting the development of evidence-based medicine in TCM. The results underscore the potential of using a distant supervision-based relation extraction approach to both validate and expand knowledge bases, contributing to the broader progression of evidence-based practices in the realm of TCM. other: Keywords: Relation extraction, knowledge graph, distant supervision, named entity recognition, literature, biomedical knowledge graph.","PeriodicalId":184819,"journal":{"name":"Current Chinese Science","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Chinese Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2174/0122102981269053230921074451","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background:: The task of relation extraction is a crucial component in the construction of a knowledge graph. However, it often necessitates a significant amount of manual annotation, which can be time-consuming and expensive. Distant supervision, as a technique, seeks to mitigate this challenge by generating a large volume of pseudo-training data at a minimal cost, achieved by mapping triple facts onto the raw text. Objective:: The aim of this study is to explore the novelty and potential of the distant supervisionbased relation extraction approach. By leveraging this innovative method, we aim to enhance knowledge reliability and facilitate new knowledge discovery, establishing associations between knowledge from specific biomedical data or existing knowledge graphs and literature. Method:: This study presents a methodology to construct a biomedical knowledge graph employing distant supervision techniques. Through establishing links between knowledge entities and relevant literature sources, we methodically extract and integrate information, thereby expanding and enriching the knowledge graph. This study identified five types of biomedical entities (e.g., diseases, symptoms and genes) and four kinds of relationships. These were linked to PubMed literature and divided into training and testing datasets. To mitigate data noise, the training set underwent preprocessing, while the testing set was manually curated. method: This study introduces a methodology for constructing a biomedical knowledge graph using distant supervision techniques. By establishing connections between knowledge entities and relevant literature sources, we systematically extract and integrate information to expand and enrich the knowledge graph. Results:: In our research, we successfully associated 230,698 triples from the existing knowledge graph with relevant literature. Furthermore, we identified additional 205,148 new triples directly sourced from these studies. Conclusion:: Our study markedly advances the field of biomedical knowledge graph enrichment, particularly in the context of Traditional Chinese Medicine (TCM). By validating a substantial number of triples through literature associations and uncovering over 200,000 new triples, we have made a significant stride in promoting the development of evidence-based medicine in TCM. The results underscore the potential of using a distant supervision-based relation extraction approach to both validate and expand knowledge bases, contributing to the broader progression of evidence-based practices in the realm of TCM. other: Keywords: Relation extraction, knowledge graph, distant supervision, named entity recognition, literature, biomedical knowledge graph.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于远程监督的生物医学文献知识图谱关系提取
背景:关系抽取是构建知识图谱的重要组成部分。然而,它通常需要大量的手工注释,这既耗时又昂贵。远程监督作为一种技术,通过将三重事实映射到原始文本上,以最小的成本生成大量的伪训练数据,从而寻求减轻这一挑战。目的:探讨基于远程监督的关系提取方法的新颖性和潜力。通过利用这种创新方法,我们的目标是提高知识的可靠性,促进新知识的发现,在特定生物医学数据或现有知识图谱和文献之间建立知识关联。方法:本研究提出了一种采用远程监督技术构建生物医学知识图谱的方法。通过建立知识实体与相关文献来源之间的联系,系统地提取和整合信息,从而扩展和丰富知识图谱。本研究确定了五种生物医学实体(如疾病、症状和基因)和四种关系。这些数据与PubMed文献相关联,并分为训练和测试数据集。为了减少数据噪声,训练集进行了预处理,而测试集则是手工编制的。方法:本研究介绍一种利用远程监督技术建构生物医学知识图谱的方法。通过建立知识实体与相关文献来源之间的联系,系统地提取和整合信息,扩展和丰富知识图谱。结果:在我们的研究中,我们成功地将现有知识图谱中的230,698个三元组与相关文献关联起来。此外,我们还直接从这些研究中发现了205,148个新的三联体。结论:我们的研究显著推进了生物医学知识图谱丰富领域,特别是在中医领域。通过文献联系法验证了相当数量的三元组,发现了20多万个新三元组,在推进循证医学发展方面迈出了重要步伐。研究结果强调了使用基于远程监督的关系提取方法验证和扩展知识库的潜力,有助于中医领域循证实践的更广泛发展。关键词:关系提取,知识图谱,远程监督,命名实体识别,文献,生物医学知识图谱。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Muscle Transcriptome Provides Insights into the Allergen Profile of Habitat-Specific Mature Hilsa shad Tenualosa ilisha Recent Advances in Machine Learning Methods for LncRNA-Cancer Associations Prediction Comparison of Pressure-Based and Skin Friction-based Methods for the Determination of Flow Separation of a Circular Cylinder with Roundness Imperfection A Mini-review on the Chemical Composition, Extraction and Isolation Techniques, and Pharmacological Activity of Rosmarinus officinalis L Mechanism of Houpu Wenzhong Decoction in the Treatment of Chronic Gastritis and Depression Based on Network Pharmacology and Molecular Docking
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1