Potential of multimodal large language models for data mining of medical images and free-text reports

Yutong Zhang , Yi Pan , Tianyang Zhong , Peixin Dong , Kangni Xie , Yuxiao Liu , Hanqi Jiang , Zihao Wu , Zhengliang Liu , Wei Zhao , Wei Zhang , Shijie Zhao , Tuo Zhang , Xi Jiang , Dinggang Shen , Tianming Liu , Xin Zhang
{"title":"Potential of multimodal large language models for data mining of medical images and free-text reports","authors":"Yutong Zhang ,&nbsp;Yi Pan ,&nbsp;Tianyang Zhong ,&nbsp;Peixin Dong ,&nbsp;Kangni Xie ,&nbsp;Yuxiao Liu ,&nbsp;Hanqi Jiang ,&nbsp;Zihao Wu ,&nbsp;Zhengliang Liu ,&nbsp;Wei Zhao ,&nbsp;Wei Zhang ,&nbsp;Shijie Zhao ,&nbsp;Tuo Zhang ,&nbsp;Xi Jiang ,&nbsp;Dinggang Shen ,&nbsp;Tianming Liu ,&nbsp;Xin Zhang","doi":"10.1016/j.metrad.2024.100103","DOIUrl":null,"url":null,"abstract":"<div><div>Medical images and radiology reports are essential for physicians to diagnose medical conditions. However, the vast diversity and cross-source heterogeneity inherent in these data have posed significant challenges to the generalizability of current data-mining methods for clinical decision-making. Recently, multimodal large language models (MLLMs), especially Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models, have revolutionized numerous domains, significantly impacting the medical field. In this study, we conducted a detailed evaluation of the performance of the Gemini series models (including Gemini-1.0-Pro-Vision, Gemini-1.5-Pro, and Gemini-1.5-Flash) and GPT series models (including GPT-4o, GPT-4-Turbo, and GPT-3.5-Turbo) across 14 medical datasets, covering 5 medical imaging categories (dermatology, radiology, dentistry, ophthalmology, and endoscopy) and 3 radiology report datasets. The investigated tasks encompass disease classification, lesion segmentation, anatomical localization, disease diagnosis, report generation, and lesion detection. Moreover, we also validated the performance of the Claude-3-Opus, Yi-Large, Yi-Large-Turbo, and LLaMA 3 models to gain a comprehensive understanding of the MLLM models in the medical field. Our experimental results demonstrated that Gemini-series models excelled in report generation and lesion detection but faces challenges in disease classification and anatomical localization. Conversely, GPT-series models exhibited proficiency in lesion segmentation and anatomical localization but encountered difficulties in disease diagnosis and lesion detection. Additionally, both the Gemini series and GPT series contain models that have demonstrated commendable generation efficiency. While both models hold promise in reducing physician workload, alleviating pressure on limited healthcare resources, and fostering collaboration between clinical practitioners and artificial intelligence technologies, substantial enhancements and comprehensive validations remain imperative before clinical deployment.</div></div>","PeriodicalId":100921,"journal":{"name":"Meta-Radiology","volume":"2 4","pages":"Article 100103"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Meta-Radiology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2950162824000572","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Medical images and radiology reports are essential for physicians to diagnose medical conditions. However, the vast diversity and cross-source heterogeneity inherent in these data have posed significant challenges to the generalizability of current data-mining methods for clinical decision-making. Recently, multimodal large language models (MLLMs), especially Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models, have revolutionized numerous domains, significantly impacting the medical field. In this study, we conducted a detailed evaluation of the performance of the Gemini series models (including Gemini-1.0-Pro-Vision, Gemini-1.5-Pro, and Gemini-1.5-Flash) and GPT series models (including GPT-4o, GPT-4-Turbo, and GPT-3.5-Turbo) across 14 medical datasets, covering 5 medical imaging categories (dermatology, radiology, dentistry, ophthalmology, and endoscopy) and 3 radiology report datasets. The investigated tasks encompass disease classification, lesion segmentation, anatomical localization, disease diagnosis, report generation, and lesion detection. Moreover, we also validated the performance of the Claude-3-Opus, Yi-Large, Yi-Large-Turbo, and LLaMA 3 models to gain a comprehensive understanding of the MLLM models in the medical field. Our experimental results demonstrated that Gemini-series models excelled in report generation and lesion detection but faces challenges in disease classification and anatomical localization. Conversely, GPT-series models exhibited proficiency in lesion segmentation and anatomical localization but encountered difficulties in disease diagnosis and lesion detection. Additionally, both the Gemini series and GPT series contain models that have demonstrated commendable generation efficiency. While both models hold promise in reducing physician workload, alleviating pressure on limited healthcare resources, and fostering collaboration between clinical practitioners and artificial intelligence technologies, substantial enhancements and comprehensive validations remain imperative before clinical deployment.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
多模态大语言模型在医学图像和自由文本报告数据挖掘中的潜力
医学影像和放射报告是医生诊断病情的重要依据。然而,这些数据固有的巨大多样性和跨源异质性对当前数据挖掘方法在临床决策中的普适性提出了巨大挑战。最近,多模态大语言模型(MLLMs),尤其是双子座系列(Gemini-Vision-series,Gemini)和GPT-4系列(GPT-4)模型,在众多领域掀起了一场革命,对医疗领域产生了重大影响。在本研究中,我们对 Gemini 系列模型(包括 Gemini-1.0-Pro-Vision、Gemini-1.5-Pro 和 Gemini-1.5-Flash)和 GPT 系列模型(包括 GPT-4o、GPT-4-Turbo 和 GPT-3.5-Turbo)在 14 个医疗数据集上的性能进行了详细评估,这些数据集涵盖 5 个医学影像类别(皮肤科、放射科、牙科、眼科和内窥镜)和 3 个放射报告数据集。研究任务包括疾病分类、病灶分割、解剖定位、疾病诊断、报告生成和病灶检测。此外,我们还验证了 Claude-3-Opus、Yi-Large、Yi-Large-Turbo 和 LLaMA 3 模型的性能,以全面了解 MLLM 模型在医疗领域的应用。实验结果表明,Gemini 系列模型在报告生成和病变检测方面表现出色,但在疾病分类和解剖定位方面面临挑战。与此相反,GPT 系列模型在病灶分割和解剖定位方面表现出色,但在疾病诊断和病灶检测方面遇到了困难。此外,Gemini 系列和 GPT 系列中的模型都表现出了值得称赞的生成效率。虽然这两种模型都有望减轻医生的工作量、缓解有限医疗资源的压力并促进临床医师与人工智能技术之间的合作,但在临床应用之前,仍必须进行实质性改进和全面验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Advancements in the application of deep learning for coronary artery calcification Rethinking the studies of diagnostic biomarkers for mental disorders One scan, multiple insights: A review of AI-Driven biomarker imaging and composite measure detection in lung cancer screening A systematic evaluation of GPT-4V's multimodal capability for chest X-ray image analysis Integrating AI in college education: Positive yet mixed experiences with ChatGPT
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1