Harnessing large multimodal models in pulmonary CT: the generative AI edge in lung cancer diagnostics

IF 8.1 1区医学 Q1 HEALTH CARE SCIENCES & SERVICES The Lancet Regional Health: Western Pacific Pub Date : 2025-02-01 DOI:10.1016/j.lanwpc.2024.101336

Lihaoyun Huang, Junyi Shen, Anqi Lin, Jian Zhang, Peng Luo, Ting Wei

{"title":"Harnessing large multimodal models in pulmonary CT: the generative AI edge in lung cancer diagnostics","authors":"Lihaoyun Huang, Junyi Shen, Anqi Lin, Jian Zhang, Peng Luo, Ting Wei","doi":"10.1016/j.lanwpc.2024.101336","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Generative Artificial Intelligence (Gen-AI) has rapidly advanced in multimodal information processing, particularly in medical applications such as the refinement of instruments and interpretation of medical images. However, limited evidence exists on the diagnostic performance of Gen-AI models in tumor recognition, particularly using computed tomography (CT) images. This study aimed to evaluate the diagnostic capabilities of several prevelant Gen-AI models (GPT-4-turbo, Gemini-pro-vision, Claude-3-opus) in the context of lung CT image analysis.</div></div><div><h3>Methods</h3><div>This retrospective study analyzed chest CT scans from 404 patients with lung conditions with lung neoplasms (n=184) and non-malignancy (n=210). After standardizing CT images, the diagnostic performance and reliability of three Gen-AI (GPT-4-turbo, Gemini-pro-vision, and Claude-3-opus) were assessed using chi-square tests and Receiver Operating Characteristic (ROC) curves across various clinical scenarios. Likert scale scoring and response rate analysis were employed to evaluate internal diagnostic tendencies, while regression analyses were conducted for model optimization.</div></div><div><h3>Findings</h3><div>In a cueing environment limited to a single CT image, Gemini demonstrated the highest diagnostic accuracy (92.21%), followed by Claude (91.49%), while GPT exhibited the lowest performance (65.22%). As the complexity of the cueing environment increased, all models experienced a decline in diagnostic accuracy. Claude showed a marginal decrease, whereas Gemini's accuracy fluctuated significantly. Under simplified cueing conditions, the performance of all models improved notably (Gemini AUC = 0.76, Claude AUC = 0.69, GPT AUC = 0.73). Feature identification analysis revealed that Claude and GPT excelled in recognizing key features, particularly prioritizing “Morphology/Margins” when diagnosing primary malignancies, with “spiculated” and “irregular” serving as critical indicators. However, in cases of misdiagnosis or missed diagnoses, Gen-AI exhibited significant deviations across multiple feature dimensions—some even completely contradicted the actual findings. Following optimization through Lasso and stepwise regression, the diagnostic performance of the models was significantly enhanced (AUC = 0.896 and AUC = 0.894, respectively).</div></div><div><h3>Interpretation</h3><div>Gen-AI shows promising potential in pulmonary CT imaging, particularly in simplified diagnostic settings. However, their limitations in processing complex multi-modal information highlight significant challenges for clinical integration. Ongoing efforts to improve the robustness and reliability of these models are crucial for their successful adoption in healthcare.</div></div>","PeriodicalId":22792,"journal":{"name":"The Lancet Regional Health: Western Pacific","volume":"55 ","pages":"Article 101336"},"PeriodicalIF":8.1000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Lancet Regional Health: Western Pacific","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666606524003304","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Background

Generative Artificial Intelligence (Gen-AI) has rapidly advanced in multimodal information processing, particularly in medical applications such as the refinement of instruments and interpretation of medical images. However, limited evidence exists on the diagnostic performance of Gen-AI models in tumor recognition, particularly using computed tomography (CT) images. This study aimed to evaluate the diagnostic capabilities of several prevelant Gen-AI models (GPT-4-turbo, Gemini-pro-vision, Claude-3-opus) in the context of lung CT image analysis.

Methods

This retrospective study analyzed chest CT scans from 404 patients with lung conditions with lung neoplasms (n=184) and non-malignancy (n=210). After standardizing CT images, the diagnostic performance and reliability of three Gen-AI (GPT-4-turbo, Gemini-pro-vision, and Claude-3-opus) were assessed using chi-square tests and Receiver Operating Characteristic (ROC) curves across various clinical scenarios. Likert scale scoring and response rate analysis were employed to evaluate internal diagnostic tendencies, while regression analyses were conducted for model optimization.

Findings

In a cueing environment limited to a single CT image, Gemini demonstrated the highest diagnostic accuracy (92.21%), followed by Claude (91.49%), while GPT exhibited the lowest performance (65.22%). As the complexity of the cueing environment increased, all models experienced a decline in diagnostic accuracy. Claude showed a marginal decrease, whereas Gemini's accuracy fluctuated significantly. Under simplified cueing conditions, the performance of all models improved notably (Gemini AUC = 0.76, Claude AUC = 0.69, GPT AUC = 0.73). Feature identification analysis revealed that Claude and GPT excelled in recognizing key features, particularly prioritizing “Morphology/Margins” when diagnosing primary malignancies, with “spiculated” and “irregular” serving as critical indicators. However, in cases of misdiagnosis or missed diagnoses, Gen-AI exhibited significant deviations across multiple feature dimensions—some even completely contradicted the actual findings. Following optimization through Lasso and stepwise regression, the diagnostic performance of the models was significantly enhanced (AUC = 0.896 and AUC = 0.894, respectively).

Interpretation

Gen-AI shows promising potential in pulmonary CT imaging, particularly in simplified diagnostic settings. However, their limitations in processing complex multi-modal information highlight significant challenges for clinical integration. Ongoing efforts to improve the robustness and reliability of these models are crucial for their successful adoption in healthcare.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用肺部CT中的大型多模态模型：肺癌诊断中的生成人工智能优势

生成式人工智能（Gen-AI）在多模态信息处理方面取得了迅速进展，特别是在医疗应用中，如仪器的改进和医学图像的解释。然而，关于Gen-AI模型在肿瘤识别中的诊断性能的证据有限，特别是使用计算机断层扫描（CT）图像。本研究旨在评估几种相关Gen-AI模型（GPT-4-turbo、Gemini-pro-vision、Claude-3-opus）在肺部CT图像分析中的诊断能力。方法回顾性分析404例合并肺肿瘤（184例）和非恶性肺疾病（210例）的胸部CT扫描结果。在对CT图像进行标准化后，采用卡方检验和受试者工作特征（ROC）曲线评估三种Gen-AI （GPT-4-turbo、Gemini-pro-vision和Claude-3-opus）在不同临床场景下的诊断性能和可靠性。采用李克特量表评分法和反应率分析法评估内部诊断倾向，并进行回归分析进行模型优化。在单张CT图像的提示环境下，Gemini的诊断准确率最高（92.21%），Claude次之（91.49%），GPT的诊断准确率最低（65.22%）。随着提示环境的复杂性增加，所有模型的诊断准确性都有所下降。克劳德的准确率略有下降，而双子座的准确率波动很大。在简化提示条件下，所有模型的性能均有显著提高（Gemini AUC = 0.76, Claude AUC = 0.69, GPT AUC = 0.73）。特征识别分析显示，Claude和GPT在识别关键特征方面表现出色，特别是在诊断原发性恶性肿瘤时优先考虑“形态学/边缘”，“针状”和“不规则”是关键指标。然而，在误诊或漏诊的情况下，Gen-AI在多个特征维度上表现出明显的偏差，有些甚至与实际发现完全矛盾。经Lasso和逐步回归优化后，模型的诊断性能显著提高（AUC分别为0.896和0.894）。gen - ai在肺部CT成像方面显示出很大的潜力，特别是在简化诊断环境中。然而，它们在处理复杂的多模式信息方面的局限性突出了临床整合的重大挑战。不断努力提高这些模型的健壮性和可靠性，对于它们在医疗保健领域的成功采用至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

The Lancet Regional Health: Western Pacific Medicine-Pediatrics, Perinatology and Child Health

CiteScore

8.80

自引率

2.80%

发文量

305

审稿时长

11 weeks

期刊介绍： The Lancet Regional Health – Western Pacific, a gold open access journal, is an integral part of The Lancet's global initiative advocating for healthcare quality and access worldwide. It aims to advance clinical practice and health policy in the Western Pacific region, contributing to enhanced health outcomes. The journal publishes high-quality original research shedding light on clinical practice and health policy in the region. It also includes reviews, commentaries, and opinion pieces covering diverse regional health topics, such as infectious diseases, non-communicable diseases, child and adolescent health, maternal and reproductive health, aging health, mental health, the health workforce and systems, and health policy.