Extracting and Measuring Uncertain Biomedical Knowledge from Scientific Statements

Xin Guo, Yuming Chen, Jian Du, Erdan Dong
{"title":"Extracting and Measuring Uncertain Biomedical Knowledge from Scientific Statements","authors":"Xin Guo, Yuming Chen, Jian Du, Erdan Dong","doi":"10.2478/jdis-2022-0008","DOIUrl":null,"url":null,"abstract":"Abstract Purpose Given the information overload of scientific literature, there is an increasing need for computable biomedical knowledge buried in free text. This study aimed to develop a novel approach to extracting and measuring uncertain biomedical knowledge from scientific statements. Design/methodology/approach Taking cardiovascular research publications in China as a sample, we extracted subject–predicate–object triples (SPO triples) as knowledge units and unknown/hedging/conflicting uncertainties as the knowledge context. We introduced information entropy (IE) as potential metric to quantify the uncertainty of epistemic status of scientific knowledge represented at subject-object pairs (SO pairs) levels. Findings The results indicated an extraordinary growth of cardiovascular publications in China while only a modest growth of the novel SPO triples. After evaluating the uncertainty of biomedical knowledge with IE, we identified the Top 10 SO pairs with highest IE, which implied the epistemic status pluralism. Visual presentation of the SO pairs overlaid with uncertainty provided a comprehensive overview of clusters of biomedical knowledge and contending topics in cardiovascular research. Research limitations The current methods didn’t distinguish the specificity and probabilities of uncertainty cue words. The number of sentences surrounding a given triple may also influence the value of IE. Practical implications Our approach identified major uncertain knowledge areas such as diagnostic biomarkers, genetic polymorphism and co-existing risk factors related to cardiovascular diseases in China. These areas are suggested to be prioritized; new hypotheses need to be verified, while disputes, conflicts, and contradictions need to be settled. Originality/value We provided a novel approach by combining natural language processing and computational linguistics with informetric methods to extract and measure uncertain knowledge from scientific statements.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"7 1","pages":"6 - 30"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of data and information science (Warsaw, Poland)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/jdis-2022-0008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract Purpose Given the information overload of scientific literature, there is an increasing need for computable biomedical knowledge buried in free text. This study aimed to develop a novel approach to extracting and measuring uncertain biomedical knowledge from scientific statements. Design/methodology/approach Taking cardiovascular research publications in China as a sample, we extracted subject–predicate–object triples (SPO triples) as knowledge units and unknown/hedging/conflicting uncertainties as the knowledge context. We introduced information entropy (IE) as potential metric to quantify the uncertainty of epistemic status of scientific knowledge represented at subject-object pairs (SO pairs) levels. Findings The results indicated an extraordinary growth of cardiovascular publications in China while only a modest growth of the novel SPO triples. After evaluating the uncertainty of biomedical knowledge with IE, we identified the Top 10 SO pairs with highest IE, which implied the epistemic status pluralism. Visual presentation of the SO pairs overlaid with uncertainty provided a comprehensive overview of clusters of biomedical knowledge and contending topics in cardiovascular research. Research limitations The current methods didn’t distinguish the specificity and probabilities of uncertainty cue words. The number of sentences surrounding a given triple may also influence the value of IE. Practical implications Our approach identified major uncertain knowledge areas such as diagnostic biomarkers, genetic polymorphism and co-existing risk factors related to cardiovascular diseases in China. These areas are suggested to be prioritized; new hypotheses need to be verified, while disputes, conflicts, and contradictions need to be settled. Originality/value We provided a novel approach by combining natural language processing and computational linguistics with informetric methods to extract and measure uncertain knowledge from scientific statements.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从科学声明中提取和测量不确定的生物医学知识
摘要目的鉴于科学文献的信息过载,人们越来越需要隐藏在自由文本中的可计算生物医学知识。本研究旨在开发一种从科学声明中提取和测量不确定生物医学知识的新方法。设计/方法论/方法以中国心血管研究出版物为样本,我们提取了主-谓语-宾语三元组(SPO三元组)作为知识单元,提取了未知/对冲/冲突的不确定性作为知识背景。我们引入了信息熵(IE)作为潜在的度量标准,以量化在主客体对(SO对)水平上表示的科学知识的认识状态的不确定性。研究结果表明,中国心血管疾病出版物的增长非同寻常,而新的SPO三倍出版物仅略有增长。在用IE评估生物医学知识的不确定性后,我们确定了IE最高的前10个SO对,这意味着认知地位的多元主义。覆盖着不确定性的SO对的视觉呈现提供了生物医学知识集群和心血管研究中竞争主题的全面概述。研究局限性目前的方法没有区分不确定性提示词的特异性和概率。围绕给定三元组的句子数量也可能影响IE的价值。实际意义我们的方法确定了主要的不确定知识领域,如诊断生物标志物、遗传多态性和与中国心血管疾病相关的共存风险因素。建议优先考虑这些领域;新的假设需要验证,而争议、冲突和矛盾需要解决。独创性/价值我们提供了一种新颖的方法,将自然语言处理和计算语言学与信息测量方法相结合,从科学陈述中提取和测量不确定的知识。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Editorial board publication strategy and acceptance rates in Turkish national journals Multimodal sentiment analysis for social media contents during public emergencies Perspectives from a publishing ethics and research integrity team for required improvements Build neural network models to identify and correct news headlines exaggerating obesity-related scientific findings An author credit allocation method with improved distinguishability and robustness
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1