Evaluating generative AI models for explainable pathological feature extraction in lung adenocarcinoma: grading assessment and prognostic model construction

IF 7.6 1区医学 Q1 HEALTH CARE SCIENCES & SERVICES The Lancet Regional Health: Western Pacific Pub Date : 2025-02-01 DOI:10.1016/j.lanwpc.2024.101352

Junyi Shen, Anqi Lin, Ting Wei, Jian Zhang, Peng Luo

{"title":"Evaluating generative AI models for explainable pathological feature extraction in lung adenocarcinoma: grading assessment and prognostic model construction","authors":"Junyi Shen, Anqi Lin, Ting Wei, Jian Zhang, Peng Luo","doi":"10.1016/j.lanwpc.2024.101352","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>With the widespread application of generative AI (GenAI) models, it is crucial to systematically evaluate their performance in lung adenocarcinoma histopathological assessment. This study aimed to evaluate and compare the performance of three GenAI models with visual capabilities (GPT-4o, Claude-3.5-Sonnet, and Gemini-1.5-Pro) in lung adenocarcinoma histological pattern recognition and grading, and to explore the construction of prognostic prediction models based on GenAI feature extraction.</div></div><div><h3>Methods</h3><div>This retrospective study extracted 310 diagnostic slides from the TCGA-LUAD database for model evaluation. An additional 87 diagnostic pathology slides from local lung adenocarcinoma surgical patients were used for external validation of the prognostic model. Primary outcomes were GenAI grading accuracy and stability, measured by the area under the receiver operating characteristic curve (AUC) and intraclass correlation coefficient (ICC), respectively. Secondary outcomes included the construction and assessment of machine learning-based prognostic prediction models, utilizing features extracted by GenAI, with model performance evaluated using the Concordance index (C-index).</div></div><div><h3>Findings</h3><div>Claude-3.5-Sonnet demonstrated the best overall performance, combining high grading accuracy (average AUC = 0.82) with moderate stability (ICC = 0.59) The optimal machine learning-based prognostic model, constructed using features extracted by Claude-3.5-Sonnet and incorporating clinical variables, showed good performance in both internal and external validation, with an average C-index of 0.72. Meta-analysis demonstrated that this prognostic model effectively stratified patients into risk groups, with the high-risk group showing significantly worse outcomes (Hazard ratio = 6.44, 95% confidence interval = 3.42-12.14).</div></div><div><h3>Interpretation</h3><div>This study demonstrates the potential application value of GenAI models in lung adenocarcinoma histopathological assessment. Claude-3.5-Sonnet demonstrated the highest grading accuracy, and the machine learning-based prognostic model that utilized its feature extraction showed good predictive capabilities. These findings provide new research directions for AI-assisted pathological diagnosis and prognostic prediction, with the potential to improve the management of lung adenocarcinoma patients.</div></div>","PeriodicalId":22792,"journal":{"name":"The Lancet Regional Health: Western Pacific","volume":"55 ","pages":"Article 101352"},"PeriodicalIF":7.6000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Lancet Regional Health: Western Pacific","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666606524003468","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Background

With the widespread application of generative AI (GenAI) models, it is crucial to systematically evaluate their performance in lung adenocarcinoma histopathological assessment. This study aimed to evaluate and compare the performance of three GenAI models with visual capabilities (GPT-4o, Claude-3.5-Sonnet, and Gemini-1.5-Pro) in lung adenocarcinoma histological pattern recognition and grading, and to explore the construction of prognostic prediction models based on GenAI feature extraction.

Methods

This retrospective study extracted 310 diagnostic slides from the TCGA-LUAD database for model evaluation. An additional 87 diagnostic pathology slides from local lung adenocarcinoma surgical patients were used for external validation of the prognostic model. Primary outcomes were GenAI grading accuracy and stability, measured by the area under the receiver operating characteristic curve (AUC) and intraclass correlation coefficient (ICC), respectively. Secondary outcomes included the construction and assessment of machine learning-based prognostic prediction models, utilizing features extracted by GenAI, with model performance evaluated using the Concordance index (C-index).

Findings

Claude-3.5-Sonnet demonstrated the best overall performance, combining high grading accuracy (average AUC = 0.82) with moderate stability (ICC = 0.59) The optimal machine learning-based prognostic model, constructed using features extracted by Claude-3.5-Sonnet and incorporating clinical variables, showed good performance in both internal and external validation, with an average C-index of 0.72. Meta-analysis demonstrated that this prognostic model effectively stratified patients into risk groups, with the high-risk group showing significantly worse outcomes (Hazard ratio = 6.44, 95% confidence interval = 3.42-12.14).

Interpretation

This study demonstrates the potential application value of GenAI models in lung adenocarcinoma histopathological assessment. Claude-3.5-Sonnet demonstrated the highest grading accuracy, and the machine learning-based prognostic model that utilized its feature extraction showed good predictive capabilities. These findings provide new research directions for AI-assisted pathological diagnosis and prognostic prediction, with the potential to improve the management of lung adenocarcinoma patients.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

背景随着生成式人工智能（GenAI）模型的广泛应用，系统评估其在肺腺癌组织病理学评估中的性能至关重要。本研究旨在评估和比较三种具有视觉能力的 GenAI 模型（GPT-4o、Claude-3.5-Sonnet 和 Gemini-1.5-Pro）在肺腺癌组织学模式识别和分级中的表现，并探索基于 GenAI 特征提取构建预后预测模型。另外87张来自当地肺腺癌手术患者的诊断病理切片用于预后模型的外部验证。主要结果是GenAI分级的准确性和稳定性，分别用接收者操作特征曲线下面积（AUC）和类内相关系数（ICC）来衡量。次要结果包括利用 GenAI 提取的特征构建和评估基于机器学习的预后预测模型，并使用一致性指数（C-index）评估模型性能。利用 Claude-3.5-Sonnet 提取的特征并结合临床变量构建的基于机器学习的最佳预后模型在内部和外部验证中均表现良好，平均 C-index 为 0.72。Meta分析表明，该预后模型能有效地将患者分为不同的风险组，其中高风险组的预后明显较差（危险比=6.44，95%置信区间=3.42-12.14）。Claude-3.5-Sonnet显示了最高的分级准确性，利用其特征提取的基于机器学习的预后模型显示了良好的预测能力。这些发现为人工智能辅助病理诊断和预后预测提供了新的研究方向，有望改善肺腺癌患者的管理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

The Lancet Regional Health: Western Pacific Medicine-Pediatrics, Perinatology and Child Health

CiteScore

8.80

自引率

2.80%

发文量

305

审稿时长

11 weeks

期刊介绍： The Lancet Regional Health – Western Pacific, a gold open access journal, is an integral part of The Lancet's global initiative advocating for healthcare quality and access worldwide. It aims to advance clinical practice and health policy in the Western Pacific region, contributing to enhanced health outcomes. The journal publishes high-quality original research shedding light on clinical practice and health policy in the region. It also includes reviews, commentaries, and opinion pieces covering diverse regional health topics, such as infectious diseases, non-communicable diseases, child and adolescent health, maternal and reproductive health, aging health, mental health, the health workforce and systems, and health policy.