Efe Cem Erdat, Merih Yalçıner, Mehmet Berk Örüncü, Yüksel Ürün, Filiz Çay Şenler
{"title":"Assessing the accuracy of the GPT-4 model in multidisciplinary tumor board decision prediction.","authors":"Efe Cem Erdat, Merih Yalçıner, Mehmet Berk Örüncü, Yüksel Ürün, Filiz Çay Şenler","doi":"10.1007/s12094-025-03905-1","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Artificial intelligence models like GPT-4 (OpenAI) have the potential to support clinical decision-making in oncology. This study aimed to assess the consistency between multidisciplinary tumor board (MTB) decisions and GPT-4 model predictions in cancer patient management.</p><p><strong>Patients and methods: </strong>A cross-sectional study was conducted involving patients aged ≥ 18 years with definite or suspicious cancer diagnoses presented at MTBs in Ankara University Hospitals, Türkiye, from February 2021 to June 2023. GPT-4 was utilized to generate treatment recommendations based on case summaries. Three independent raters evaluated the compatibility between MTB decisions and GPT-4 predictions using a 4-point Likert scale. Cases with mean compatibility scores equal to or below 2 were reviewed by two expert oncologists for appropriateness.</p><p><strong>Results: </strong>A total of 610 patients were included. The mean compatibility score across raters was 3.59 (SD = 0.81), indicating high agreement between GPT-4 predictions and MTB decisions. Cronbach's alpha was 0.950 (95% CI 0.935-0.960), demonstrating excellent interrater reliability. Sixty-two cases (10.2%) had mean compatibility scores below the threshold of 2. The first expert oncologist deemed GPT-4's predictions inappropriate in 8 of these cases (12.9%), while the second deemed them inappropriate in 16 cases (25.8%). Cohen's kappa showed moderate agreement (κ = 0.50, 95% CI 0.25-0.75, p < 0.001). Discrepancies were often due to rare cases lacking guideline information or misunderstandings of case presentations.</p><p><strong>Conclusion: </strong>GPT-4 exhibited high compatibility with MTB decisions in cancer patient management, suggesting its potential as a supportive tool in clinical oncology. However, limitations exist, especially in rare or complex cases.</p>","PeriodicalId":50685,"journal":{"name":"Clinical & Translational Oncology","volume":" ","pages":"3793-3802"},"PeriodicalIF":2.5000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12399707/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical & Translational Oncology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s12094-025-03905-1","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/25 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: Artificial intelligence models like GPT-4 (OpenAI) have the potential to support clinical decision-making in oncology. This study aimed to assess the consistency between multidisciplinary tumor board (MTB) decisions and GPT-4 model predictions in cancer patient management.
Patients and methods: A cross-sectional study was conducted involving patients aged ≥ 18 years with definite or suspicious cancer diagnoses presented at MTBs in Ankara University Hospitals, Türkiye, from February 2021 to June 2023. GPT-4 was utilized to generate treatment recommendations based on case summaries. Three independent raters evaluated the compatibility between MTB decisions and GPT-4 predictions using a 4-point Likert scale. Cases with mean compatibility scores equal to or below 2 were reviewed by two expert oncologists for appropriateness.
Results: A total of 610 patients were included. The mean compatibility score across raters was 3.59 (SD = 0.81), indicating high agreement between GPT-4 predictions and MTB decisions. Cronbach's alpha was 0.950 (95% CI 0.935-0.960), demonstrating excellent interrater reliability. Sixty-two cases (10.2%) had mean compatibility scores below the threshold of 2. The first expert oncologist deemed GPT-4's predictions inappropriate in 8 of these cases (12.9%), while the second deemed them inappropriate in 16 cases (25.8%). Cohen's kappa showed moderate agreement (κ = 0.50, 95% CI 0.25-0.75, p < 0.001). Discrepancies were often due to rare cases lacking guideline information or misunderstandings of case presentations.
Conclusion: GPT-4 exhibited high compatibility with MTB decisions in cancer patient management, suggesting its potential as a supportive tool in clinical oncology. However, limitations exist, especially in rare or complex cases.
期刊介绍:
Clinical and Translational Oncology is an international journal devoted to fostering interaction between experimental and clinical oncology. It covers all aspects of research on cancer, from the more basic discoveries dealing with both cell and molecular biology of tumour cells, to the most advanced clinical assays of conventional and new drugs. In addition, the journal has a strong commitment to facilitating the transfer of knowledge from the basic laboratory to the clinical practice, with the publication of educational series devoted to closing the gap between molecular and clinical oncologists. Molecular biology of tumours, identification of new targets for cancer therapy, and new technologies for research and treatment of cancer are the major themes covered by the educational series. Full research articles on a broad spectrum of subjects, including the molecular and cellular bases of disease, aetiology, pathophysiology, pathology, epidemiology, clinical features, and the diagnosis, prognosis and treatment of cancer, will be considered for publication.