Evaluating generative AI models for explainable pathological feature extraction in lung adenocarcinoma: grading assessment and prognostic model construction
Junyi Shen, Anqi Lin, Ting Wei, Jian Zhang, Peng Luo
{"title":"Evaluating generative AI models for explainable pathological feature extraction in lung adenocarcinoma: grading assessment and prognostic model construction","authors":"Junyi Shen, Anqi Lin, Ting Wei, Jian Zhang, Peng Luo","doi":"10.1016/j.lanwpc.2024.101352","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>With the widespread application of generative AI (GenAI) models, it is crucial to systematically evaluate their performance in lung adenocarcinoma histopathological assessment. This study aimed to evaluate and compare the performance of three GenAI models with visual capabilities (GPT-4o, Claude-3.5-Sonnet, and Gemini-1.5-Pro) in lung adenocarcinoma histological pattern recognition and grading, and to explore the construction of prognostic prediction models based on GenAI feature extraction.</div></div><div><h3>Methods</h3><div>This retrospective study extracted 310 diagnostic slides from the TCGA-LUAD database for model evaluation. An additional 87 diagnostic pathology slides from local lung adenocarcinoma surgical patients were used for external validation of the prognostic model. Primary outcomes were GenAI grading accuracy and stability, measured by the area under the receiver operating characteristic curve (AUC) and intraclass correlation coefficient (ICC), respectively. Secondary outcomes included the construction and assessment of machine learning-based prognostic prediction models, utilizing features extracted by GenAI, with model performance evaluated using the Concordance index (C-index).</div></div><div><h3>Findings</h3><div>Claude-3.5-Sonnet demonstrated the best overall performance, combining high grading accuracy (average AUC = 0.82) with moderate stability (ICC = 0.59) The optimal machine learning-based prognostic model, constructed using features extracted by Claude-3.5-Sonnet and incorporating clinical variables, showed good performance in both internal and external validation, with an average C-index of 0.72. Meta-analysis demonstrated that this prognostic model effectively stratified patients into risk groups, with the high-risk group showing significantly worse outcomes (Hazard ratio = 6.44, 95% confidence interval = 3.42-12.14).</div></div><div><h3>Interpretation</h3><div>This study demonstrates the potential application value of GenAI models in lung adenocarcinoma histopathological assessment. Claude-3.5-Sonnet demonstrated the highest grading accuracy, and the machine learning-based prognostic model that utilized its feature extraction showed good predictive capabilities. These findings provide new research directions for AI-assisted pathological diagnosis and prognostic prediction, with the potential to improve the management of lung adenocarcinoma patients.</div></div>","PeriodicalId":22792,"journal":{"name":"The Lancet Regional Health: Western Pacific","volume":"55 ","pages":"Article 101352"},"PeriodicalIF":7.6000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Lancet Regional Health: Western Pacific","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666606524003468","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Background
With the widespread application of generative AI (GenAI) models, it is crucial to systematically evaluate their performance in lung adenocarcinoma histopathological assessment. This study aimed to evaluate and compare the performance of three GenAI models with visual capabilities (GPT-4o, Claude-3.5-Sonnet, and Gemini-1.5-Pro) in lung adenocarcinoma histological pattern recognition and grading, and to explore the construction of prognostic prediction models based on GenAI feature extraction.
Methods
This retrospective study extracted 310 diagnostic slides from the TCGA-LUAD database for model evaluation. An additional 87 diagnostic pathology slides from local lung adenocarcinoma surgical patients were used for external validation of the prognostic model. Primary outcomes were GenAI grading accuracy and stability, measured by the area under the receiver operating characteristic curve (AUC) and intraclass correlation coefficient (ICC), respectively. Secondary outcomes included the construction and assessment of machine learning-based prognostic prediction models, utilizing features extracted by GenAI, with model performance evaluated using the Concordance index (C-index).
Findings
Claude-3.5-Sonnet demonstrated the best overall performance, combining high grading accuracy (average AUC = 0.82) with moderate stability (ICC = 0.59) The optimal machine learning-based prognostic model, constructed using features extracted by Claude-3.5-Sonnet and incorporating clinical variables, showed good performance in both internal and external validation, with an average C-index of 0.72. Meta-analysis demonstrated that this prognostic model effectively stratified patients into risk groups, with the high-risk group showing significantly worse outcomes (Hazard ratio = 6.44, 95% confidence interval = 3.42-12.14).
Interpretation
This study demonstrates the potential application value of GenAI models in lung adenocarcinoma histopathological assessment. Claude-3.5-Sonnet demonstrated the highest grading accuracy, and the machine learning-based prognostic model that utilized its feature extraction showed good predictive capabilities. These findings provide new research directions for AI-assisted pathological diagnosis and prognostic prediction, with the potential to improve the management of lung adenocarcinoma patients.
期刊介绍:
The Lancet Regional Health – Western Pacific, a gold open access journal, is an integral part of The Lancet's global initiative advocating for healthcare quality and access worldwide. It aims to advance clinical practice and health policy in the Western Pacific region, contributing to enhanced health outcomes. The journal publishes high-quality original research shedding light on clinical practice and health policy in the region. It also includes reviews, commentaries, and opinion pieces covering diverse regional health topics, such as infectious diseases, non-communicable diseases, child and adolescent health, maternal and reproductive health, aging health, mental health, the health workforce and systems, and health policy.