Utilizing ChatGPT as a scientific reasoning engine to differentiate conflicting evidence and summarize challenges in controversial clinical questions.

IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of the American Medical Informatics Association Pub Date : 2024-06-20 DOI:10.1093/jamia/ocae100
Shiyao Xie, Wenjing Zhao, Guanghui Deng, Guohua He, Na He, Zhenhua Lu, Weihua Hu, Mingming Zhao, Jian Du
{"title":"Utilizing ChatGPT as a scientific reasoning engine to differentiate conflicting evidence and summarize challenges in controversial clinical questions.","authors":"Shiyao Xie, Wenjing Zhao, Guanghui Deng, Guohua He, Na He, Zhenhua Lu, Weihua Hu, Mingming Zhao, Jian Du","doi":"10.1093/jamia/ocae100","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Synthesizing and evaluating inconsistent medical evidence is essential in evidence-based medicine. This study aimed to employ ChatGPT as a sophisticated scientific reasoning engine to identify conflicting clinical evidence and summarize unresolved questions to inform further research.</p><p><strong>Materials and methods: </strong>We evaluated ChatGPT's effectiveness in identifying conflicting evidence and investigated its principles of logical reasoning. An automated framework was developed to generate a PubMed dataset focused on controversial clinical topics. ChatGPT analyzed this dataset to identify consensus and controversy, and to formulate unsolved research questions. Expert evaluations were conducted 1) on the consensus and controversy for factual consistency, comprehensiveness, and potential harm and, 2) on the research questions for relevance, innovation, clarity, and specificity.</p><p><strong>Results: </strong>The gpt-4-1106-preview model achieved a 90% recall rate in detecting inconsistent claim pairs within a ternary assertions setup. Notably, without explicit reasoning prompts, ChatGPT provided sound reasoning for the assertions between claims and hypotheses, based on an analysis grounded in relevance, specificity, and certainty. ChatGPT's conclusions of consensus and controversies in clinical literature were comprehensive and factually consistent. The research questions proposed by ChatGPT received high expert ratings.</p><p><strong>Discussion: </strong>Our experiment implies that, in evaluating the relationship between evidence and claims, ChatGPT considered more detailed information beyond a straightforward assessment of sentimental orientation. This ability to process intricate information and conduct scientific reasoning regarding sentiment is noteworthy, particularly as this pattern emerged without explicit guidance or directives in prompts, highlighting ChatGPT's inherent logical reasoning capabilities.</p><p><strong>Conclusion: </strong>This study demonstrated ChatGPT's capacity to evaluate and interpret scientific claims. Such proficiency can be generalized to broader clinical research literature. ChatGPT effectively aids in facilitating clinical studies by proposing unresolved challenges based on analysis of existing studies. However, caution is advised as ChatGPT's outputs are inferences drawn from the input literature and could be harmful to clinical practice.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1551-1560"},"PeriodicalIF":4.7000,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11187493/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocae100","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: Synthesizing and evaluating inconsistent medical evidence is essential in evidence-based medicine. This study aimed to employ ChatGPT as a sophisticated scientific reasoning engine to identify conflicting clinical evidence and summarize unresolved questions to inform further research.

Materials and methods: We evaluated ChatGPT's effectiveness in identifying conflicting evidence and investigated its principles of logical reasoning. An automated framework was developed to generate a PubMed dataset focused on controversial clinical topics. ChatGPT analyzed this dataset to identify consensus and controversy, and to formulate unsolved research questions. Expert evaluations were conducted 1) on the consensus and controversy for factual consistency, comprehensiveness, and potential harm and, 2) on the research questions for relevance, innovation, clarity, and specificity.

Results: The gpt-4-1106-preview model achieved a 90% recall rate in detecting inconsistent claim pairs within a ternary assertions setup. Notably, without explicit reasoning prompts, ChatGPT provided sound reasoning for the assertions between claims and hypotheses, based on an analysis grounded in relevance, specificity, and certainty. ChatGPT's conclusions of consensus and controversies in clinical literature were comprehensive and factually consistent. The research questions proposed by ChatGPT received high expert ratings.

Discussion: Our experiment implies that, in evaluating the relationship between evidence and claims, ChatGPT considered more detailed information beyond a straightforward assessment of sentimental orientation. This ability to process intricate information and conduct scientific reasoning regarding sentiment is noteworthy, particularly as this pattern emerged without explicit guidance or directives in prompts, highlighting ChatGPT's inherent logical reasoning capabilities.

Conclusion: This study demonstrated ChatGPT's capacity to evaluate and interpret scientific claims. Such proficiency can be generalized to broader clinical research literature. ChatGPT effectively aids in facilitating clinical studies by proposing unresolved challenges based on analysis of existing studies. However, caution is advised as ChatGPT's outputs are inferences drawn from the input literature and could be harmful to clinical practice.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用 ChatGPT 作为科学推理引擎,区分相互矛盾的证据,总结有争议的临床问题所面临的挑战。
目的:综合和评估不一致的医学证据对于循证医学至关重要。本研究旨在将 ChatGPT 作为一个复杂的科学推理引擎,用于识别相互矛盾的临床证据,并总结尚未解决的问题,为进一步的研究提供信息:我们评估了 ChatGPT 在识别冲突证据方面的有效性,并研究了其逻辑推理原则。我们开发了一个自动框架来生成一个 PubMed 数据集,重点关注有争议的临床话题。ChatGPT 对该数据集进行分析,以确定共识和争议,并提出尚未解决的研究问题。专家评估包括:1)共识和争议的事实一致性、全面性和潜在危害;2)研究问题的相关性、创新性、清晰性和具体性:结果:gpt-4-1106-preview 模型在检测三元断言设置中不一致的断言对方面达到了 90% 的召回率。值得注意的是,在没有明确推理提示的情况下,ChatGPT 基于相关性、具体性和确定性的分析,为主张和假设之间的断言提供了合理的推理。ChatGPT 对临床文献中的共识和争议得出的结论是全面的,与事实相符。ChatGPT 提出的研究问题获得了专家的高度评价:我们的实验表明,在评估证据与主张之间的关系时,ChatGPT 除了直接评估情感取向外,还考虑了更多详细信息。这种处理复杂信息并对情感进行科学推理的能力值得注意,特别是这种模式是在没有明确指导或提示的情况下出现的,这突出了 ChatGPT 固有的逻辑推理能力:本研究证明了 ChatGPT 评估和解释科学主张的能力。这种能力可以推广到更广泛的临床研究文献中。ChatGPT 可以在分析现有研究的基础上提出尚未解决的难题,从而有效促进临床研究。不过,由于 ChatGPT 的输出结果是从输入文献中得出的推论,可能对临床实践有害,因此建议谨慎使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of the American Medical Informatics Association
Journal of the American Medical Informatics Association 医学-计算机:跨学科应用
CiteScore
14.50
自引率
7.80%
发文量
230
审稿时长
3-8 weeks
期刊介绍: JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.
期刊最新文献
Efficacy of the mLab App: a randomized clinical trial for increasing HIV testing uptake using mobile technology. Machine learning-based prediction models in medical decision-making in kidney disease: patient, caregiver, and clinician perspectives on trust and appropriate use. Research for all: building a diverse researcher community for the All of Us Research Program. Learning health system linchpins: information exchange and a common data model. Oncointerpreter.ai enables interactive, personalized summarization of cancer diagnostics data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1