Assessing the accuracy of the GPT-4 model in multidisciplinary tumor board decision prediction.

IF 2.5 3区医学 Q2 ONCOLOGY Clinical & Translational Oncology Pub Date : 2025-09-01 Epub Date: 2025-03-25 DOI:10.1007/s12094-025-03905-1

Efe Cem Erdat, Merih Yalçıner, Mehmet Berk Örüncü, Yüksel Ürün, Filiz Çay Şenler

{"title":"Assessing the accuracy of the GPT-4 model in multidisciplinary tumor board decision prediction.","authors":"Efe Cem Erdat, Merih Yalçıner, Mehmet Berk Örüncü, Yüksel Ürün, Filiz Çay Şenler","doi":"10.1007/s12094-025-03905-1","DOIUrl":null,"url":null,"abstract":"Purpose: Artificial intelligence models like GPT-4 (OpenAI) have the potential to support clinical decision-making in oncology. This study aimed to assess the consistency between multidisciplinary tumor board (MTB) decisions and GPT-4 model predictions in cancer patient management.Patients and methods: A cross-sectional study was conducted involving patients aged ≥ 18 years with definite or suspicious cancer diagnoses presented at MTBs in Ankara University Hospitals, Türkiye, from February 2021 to June 2023. GPT-4 was utilized to generate treatment recommendations based on case summaries. Three independent raters evaluated the compatibility between MTB decisions and GPT-4 predictions using a 4-point Likert scale. Cases with mean compatibility scores equal to or below 2 were reviewed by two expert oncologists for appropriateness.Results: A total of 610 patients were included. The mean compatibility score across raters was 3.59 (SD = 0.81), indicating high agreement between GPT-4 predictions and MTB decisions. Cronbach's alpha was 0.950 (95% CI 0.935-0.960), demonstrating excellent interrater reliability. Sixty-two cases (10.2%) had mean compatibility scores below the threshold of 2. The first expert oncologist deemed GPT-4's predictions inappropriate in 8 of these cases (12.9%), while the second deemed them inappropriate in 16 cases (25.8%). Cohen's kappa showed moderate agreement (κ = 0.50, 95% CI 0.25-0.75, p < 0.001). Discrepancies were often due to rare cases lacking guideline information or misunderstandings of case presentations.Conclusion: GPT-4 exhibited high compatibility with MTB decisions in cancer patient management, suggesting its potential as a supportive tool in clinical oncology. However, limitations exist, especially in rare or complex cases.","PeriodicalId":50685,"journal":{"name":"Clinical & Translational Oncology","volume":" ","pages":"3793-3802"},"PeriodicalIF":2.5000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12399707/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical & Translational Oncology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s12094-025-03905-1","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/25 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: Artificial intelligence models like GPT-4 (OpenAI) have the potential to support clinical decision-making in oncology. This study aimed to assess the consistency between multidisciplinary tumor board (MTB) decisions and GPT-4 model predictions in cancer patient management.

Patients and methods: A cross-sectional study was conducted involving patients aged ≥ 18 years with definite or suspicious cancer diagnoses presented at MTBs in Ankara University Hospitals, Türkiye, from February 2021 to June 2023. GPT-4 was utilized to generate treatment recommendations based on case summaries. Three independent raters evaluated the compatibility between MTB decisions and GPT-4 predictions using a 4-point Likert scale. Cases with mean compatibility scores equal to or below 2 were reviewed by two expert oncologists for appropriateness.

Results: A total of 610 patients were included. The mean compatibility score across raters was 3.59 (SD = 0.81), indicating high agreement between GPT-4 predictions and MTB decisions. Cronbach's alpha was 0.950 (95% CI 0.935-0.960), demonstrating excellent interrater reliability. Sixty-two cases (10.2%) had mean compatibility scores below the threshold of 2. The first expert oncologist deemed GPT-4's predictions inappropriate in 8 of these cases (12.9%), while the second deemed them inappropriate in 16 cases (25.8%). Cohen's kappa showed moderate agreement (κ = 0.50, 95% CI 0.25-0.75, p < 0.001). Discrepancies were often due to rare cases lacking guideline information or misunderstandings of case presentations.

Conclusion: GPT-4 exhibited high compatibility with MTB decisions in cancer patient management, suggesting its potential as a supportive tool in clinical oncology. However, limitations exist, especially in rare or complex cases.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

评估GPT-4模型在多学科肿瘤委员会决策预测中的准确性。

目的：GPT-4 （OpenAI）等人工智能模型具有支持肿瘤学临床决策的潜力。本研究旨在评估多学科肿瘤委员会（MTB）决策与GPT-4模型预测在癌症患者管理中的一致性。患者和方法：一项横断面研究纳入了2021年2月至2023年6月期间在土耳其土耳其安卡拉大学医院MTBs确诊或疑似癌症的年龄≥18岁的患者。GPT-4用于根据病例总结生成治疗建议。三名独立评分者使用4点李克特量表评估MTB决策与GPT-4预测之间的兼容性。平均相容性评分等于或低于2分的病例由两名肿瘤学专家审查是否合适。结果：共纳入610例患者。评分者的平均相容性评分为3.59 (SD = 0.81)，表明GPT-4预测与MTB决策高度一致。Cronbach's alpha为0.950 (95% CI 0.935-0.960)，显示出良好的间信度。62例（10.2%）患者的平均相容性评分低于阈值2。第一位肿瘤专家认为GPT-4的预测在其中8例（12.9%）中不合适，而第二位肿瘤专家认为在16例（25.8%）中不合适。结论：GPT-4在癌症患者管理中与MTB决策具有高度相容性，提示其作为临床肿瘤学支持工具的潜力。然而，存在局限性，特别是在罕见或复杂的情况下。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Clinical & Translational Oncology 医学-肿瘤学

CiteScore

6.20

自引率

2.90%

发文量

240

审稿时长

1 months

期刊介绍： Clinical and Translational Oncology is an international journal devoted to fostering interaction between experimental and clinical oncology. It covers all aspects of research on cancer, from the more basic discoveries dealing with both cell and molecular biology of tumour cells, to the most advanced clinical assays of conventional and new drugs. In addition, the journal has a strong commitment to facilitating the transfer of knowledge from the basic laboratory to the clinical practice, with the publication of educational series devoted to closing the gap between molecular and clinical oncologists. Molecular biology of tumours, identification of new targets for cancer therapy, and new technologies for research and treatment of cancer are the major themes covered by the educational series. Full research articles on a broad spectrum of subjects, including the molecular and cellular bases of disease, aetiology, pathophysiology, pathology, epidemiology, clinical features, and the diagnosis, prognosis and treatment of cancer, will be considered for publication.