Evaluating ChatGPT's diagnostic potential for pathology images.

IF 3.1 3区 医学 Q1 MEDICINE, GENERAL & INTERNAL Frontiers in Medicine Pub Date : 2025-01-23 eCollection Date: 2024-01-01 DOI:10.3389/fmed.2024.1507203
Liya Ding, Lei Fan, Miao Shen, Yawen Wang, Kaiqin Sheng, Zijuan Zou, Huimin An, Zhinong Jiang
{"title":"Evaluating ChatGPT's diagnostic potential for pathology images.","authors":"Liya Ding, Lei Fan, Miao Shen, Yawen Wang, Kaiqin Sheng, Zijuan Zou, Huimin An, Zhinong Jiang","doi":"10.3389/fmed.2024.1507203","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Chat Generative Pretrained Transformer (ChatGPT) is a type of large language model (LLM) developed by OpenAI, known for its extensive knowledge base and interactive capabilities. These attributes make it a valuable tool in the medical field, particularly for tasks such as answering medical questions, drafting clinical notes, and optimizing the generation of radiology reports. However, keeping accuracy in medical contexts is the biggest challenge to employing GPT-4 in a clinical setting. This study aims to investigate the accuracy of GPT-4, which can process both text and image inputs, in generating diagnoses from pathological images.</p><p><strong>Methods: </strong>This study analyzed 44 histopathological images from 16 organs and 100 colorectal biopsy photomicrographs. The initial evaluation was conducted using the standard GPT-4 model in January 2024, with a subsequent re-evaluation performed in July 2024. The diagnostic accuracy of GPT-4 was assessed by comparing its outputs to a reference standard using statistical measures. Additionally, four pathologists independently reviewed the same images to compare their diagnoses with the model's outputs. Both scanned and photographed images were tested to evaluate GPT-4's generalization ability across different image types.</p><p><strong>Results: </strong>GPT-4 achieved an overall accuracy of 0.64 in identifying tumor imaging and tissue origins. For colon polyp classification, accuracy varied from 0.57 to 0.75 in different subtypes. The model achieved 0.88 accuracy in distinguishing low-grade from high-grade dysplasia and 0.75 in distinguishing high-grade dysplasia from adenocarcinoma, with a high sensitivity in detecting adenocarcinoma. Consistency between initial and follow-up evaluations showed slight to moderate agreement, with Kappa values ranging from 0.204 to 0.375.</p><p><strong>Conclusion: </strong>GPT-4 demonstrates the ability to diagnose pathological images, showing improved performance over earlier versions. Its diagnostic accuracy in cancer is comparable to that of pathology residents. These findings suggest that GPT-4 holds promise as a supportive tool in pathology diagnostics, offering the potential to assist pathologists in routine diagnostic workflows.</p>","PeriodicalId":12488,"journal":{"name":"Frontiers in Medicine","volume":"11 ","pages":"1507203"},"PeriodicalIF":3.1000,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11798939/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/fmed.2024.1507203","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Chat Generative Pretrained Transformer (ChatGPT) is a type of large language model (LLM) developed by OpenAI, known for its extensive knowledge base and interactive capabilities. These attributes make it a valuable tool in the medical field, particularly for tasks such as answering medical questions, drafting clinical notes, and optimizing the generation of radiology reports. However, keeping accuracy in medical contexts is the biggest challenge to employing GPT-4 in a clinical setting. This study aims to investigate the accuracy of GPT-4, which can process both text and image inputs, in generating diagnoses from pathological images.

Methods: This study analyzed 44 histopathological images from 16 organs and 100 colorectal biopsy photomicrographs. The initial evaluation was conducted using the standard GPT-4 model in January 2024, with a subsequent re-evaluation performed in July 2024. The diagnostic accuracy of GPT-4 was assessed by comparing its outputs to a reference standard using statistical measures. Additionally, four pathologists independently reviewed the same images to compare their diagnoses with the model's outputs. Both scanned and photographed images were tested to evaluate GPT-4's generalization ability across different image types.

Results: GPT-4 achieved an overall accuracy of 0.64 in identifying tumor imaging and tissue origins. For colon polyp classification, accuracy varied from 0.57 to 0.75 in different subtypes. The model achieved 0.88 accuracy in distinguishing low-grade from high-grade dysplasia and 0.75 in distinguishing high-grade dysplasia from adenocarcinoma, with a high sensitivity in detecting adenocarcinoma. Consistency between initial and follow-up evaluations showed slight to moderate agreement, with Kappa values ranging from 0.204 to 0.375.

Conclusion: GPT-4 demonstrates the ability to diagnose pathological images, showing improved performance over earlier versions. Its diagnostic accuracy in cancer is comparable to that of pathology residents. These findings suggest that GPT-4 holds promise as a supportive tool in pathology diagnostics, offering the potential to assist pathologists in routine diagnostic workflows.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
评估ChatGPT对病理图像的诊断潜力。
聊天生成预训练转换器(ChatGPT)是由OpenAI开发的一种大型语言模型(LLM),以其广泛的知识库和交互能力而闻名。这些属性使其成为医疗领域中有价值的工具,特别是用于回答医疗问题、起草临床记录和优化放射学报告生成等任务。然而,在医学环境中保持准确性是在临床环境中使用GPT-4的最大挑战。本研究旨在研究GPT-4的准确性,它可以处理文本和图像输入,从病理图像中生成诊断。方法:对16个脏器的44张组织病理图像和100张结直肠活检显微照片进行分析。在2024年1月使用标准GPT-4模型进行了初步评估,随后在2024年7月进行了重新评估。GPT-4的诊断准确性通过将其输出与参考标准进行比较来评估。此外,四位病理学家独立地检查了相同的图像,将他们的诊断与模型的输出进行比较。对扫描图像和拍摄图像进行测试,以评估GPT-4在不同图像类型中的泛化能力。结果:GPT-4在识别肿瘤影像和组织来源方面的总体准确率为0.64。对于结肠息肉的分类,不同亚型的准确率从0.57到0.75不等。该模型区分低级别和高级别非典型增生的准确率为0.88,区分高级别非典型增生和腺癌的准确率为0.75,对腺癌的检测灵敏度较高。初始评价与随访评价的一致性为轻度至中度一致,Kappa值为0.204 ~ 0.375。结论:GPT-4显示出诊断病理图像的能力,比早期版本表现出更高的性能。其对癌症的诊断准确度可与病理住院医师相媲美。这些发现表明,GPT-4有望成为病理学诊断的辅助工具,为病理学家的常规诊断工作流程提供帮助。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Frontiers in Medicine
Frontiers in Medicine Medicine-General Medicine
CiteScore
5.10
自引率
5.10%
发文量
3710
审稿时长
12 weeks
期刊介绍: Frontiers in Medicine publishes rigorously peer-reviewed research linking basic research to clinical practice and patient care, as well as translating scientific advances into new therapies and diagnostic tools. Led by an outstanding Editorial Board of international experts, this multidisciplinary open-access journal is at the forefront of disseminating and communicating scientific knowledge and impactful discoveries to researchers, academics, clinicians and the public worldwide. In addition to papers that provide a link between basic research and clinical practice, a particular emphasis is given to studies that are directly relevant to patient care. In this spirit, the journal publishes the latest research results and medical knowledge that facilitate the translation of scientific advances into new therapies or diagnostic tools. The full listing of the Specialty Sections represented by Frontiers in Medicine is as listed below. As well as the established medical disciplines, Frontiers in Medicine is launching new sections that together will facilitate - the use of patient-reported outcomes under real world conditions - the exploitation of big data and the use of novel information and communication tools in the assessment of new medicines - the scientific bases for guidelines and decisions from regulatory authorities - access to medicinal products and medical devices worldwide - addressing the grand health challenges around the world
期刊最新文献
Correction: Oregon primary care providers as a frontline defense in the War on Melanoma™: improving access to melanoma education. Correction: Melanoma toolkit for early detection for primary care clinicians: a 1-year follow-up on outcomes. ED90 of intravenous remimazolam for alleviating preoperative anxiety in children: a prospective dose-finding study. Association between body mass index and prognosis in interstitial lung disease: systematic review and meta-analysis. Oral and topical peptides for skin aging: systematic review and meta-analysis of randomized controlled trials.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1