ChatGPT在胶质瘤辅助治疗决策中的作用:准备好承担肿瘤委员会医生的角色了吗?

IF 4.1 Q1 HEALTH CARE SCIENCES & SERVICES BMJ Health & Care Informatics Pub Date : 2023-06-01 DOI:10.1136/bmjhci-2023-100775

Julien Haemmerli, Lukas Sveikata, Aria Nouri, Adrien May, Kristof Egervari, Christian Freyschlag, Johannes A Lobrinus, Denis Migliorini, Shahan Momjian, Nicolae Sanda, Karl Schaller, Sebastien Tran, Jacky Yeung, Philippe Bijlenga

{"title":"ChatGPT在胶质瘤辅助治疗决策中的作用:准备好承担肿瘤委员会医生的角色了吗?","authors":"Julien Haemmerli, Lukas Sveikata, Aria Nouri, Adrien May, Kristof Egervari, Christian Freyschlag, Johannes A Lobrinus, Denis Migliorini, Shahan Momjian, Nicolae Sanda, Karl Schaller, Sebastien Tran, Jacky Yeung, Philippe Bijlenga","doi":"10.1136/bmjhci-2023-100775","DOIUrl":null,"url":null,"abstract":"Objective: To evaluate ChatGPT's performance in brain glioma adjuvant therapy decision-making.Methods: We randomly selected 10 patients with brain gliomas discussed at our institution's central nervous system tumour board (CNS TB). Patients' clinical status, surgical outcome, textual imaging information and immuno-pathology results were provided to ChatGPT V.3.5 and seven CNS tumour experts. The chatbot was asked to give the adjuvant treatment choice, and the regimen while considering the patient's functional status. The experts rated the artificial intelligence-based recommendations from 0 (complete disagreement) to 10 (complete agreement). An intraclass correlation coefficient agreement (ICC) was used to measure the inter-rater agreement.Results: Eight patients (80%) met the criteria for glioblastoma and two (20%) were low-grade gliomas. The experts rated the quality of ChatGPT recommendations as poor for diagnosis (median 3, IQR 1-7.8, ICC 0.9, 95% CI 0.7 to 1.0), good for treatment recommendation (7, IQR 6-8, ICC 0.8, 95% CI 0.4 to 0.9), good for therapy regimen (7, IQR 4-8, ICC 0.8, 95% CI 0.5 to 0.9), moderate for functional status consideration (6, IQR 1-7, ICC 0.7, 95% CI 0.3 to 0.9) and moderate for overall agreement with the recommendations (5, IQR 3-7, ICC 0.7, 95% CI 0.3 to 0.9). No differences were observed between the glioblastomas and low-grade glioma ratings.Conclusions: ChatGPT performed poorly in classifying glioma types but was good for adjuvant treatment recommendations as evaluated by CNS TB experts. Even though the ChatGPT lacks the precision to replace expert opinion, it may serve as a promising supplemental tool within a human-in-the-loop approach.","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":null,"pages":null},"PeriodicalIF":4.1000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/8e/7e/bmjhci-2023-100775.PMC10314415.pdf","citationCount":"9","resultStr":"{\"title\":\"ChatGPT in glioma adjuvant therapy decision making: ready to assume the role of a doctor in the tumour board?\",\"authors\":\"Julien Haemmerli, Lukas Sveikata, Aria Nouri, Adrien May, Kristof Egervari, Christian Freyschlag, Johannes A Lobrinus, Denis Migliorini, Shahan Momjian, Nicolae Sanda, Karl Schaller, Sebastien Tran, Jacky Yeung, Philippe Bijlenga\",\"doi\":\"10.1136/bmjhci-2023-100775\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective: To evaluate ChatGPT's performance in brain glioma adjuvant therapy decision-making.Methods: We randomly selected 10 patients with brain gliomas discussed at our institution's central nervous system tumour board (CNS TB). Patients' clinical status, surgical outcome, textual imaging information and immuno-pathology results were provided to ChatGPT V.3.5 and seven CNS tumour experts. The chatbot was asked to give the adjuvant treatment choice, and the regimen while considering the patient's functional status. The experts rated the artificial intelligence-based recommendations from 0 (complete disagreement) to 10 (complete agreement). An intraclass correlation coefficient agreement (ICC) was used to measure the inter-rater agreement.Results: Eight patients (80%) met the criteria for glioblastoma and two (20%) were low-grade gliomas. The experts rated the quality of ChatGPT recommendations as poor for diagnosis (median 3, IQR 1-7.8, ICC 0.9, 95% CI 0.7 to 1.0), good for treatment recommendation (7, IQR 6-8, ICC 0.8, 95% CI 0.4 to 0.9), good for therapy regimen (7, IQR 4-8, ICC 0.8, 95% CI 0.5 to 0.9), moderate for functional status consideration (6, IQR 1-7, ICC 0.7, 95% CI 0.3 to 0.9) and moderate for overall agreement with the recommendations (5, IQR 3-7, ICC 0.7, 95% CI 0.3 to 0.9). No differences were observed between the glioblastomas and low-grade glioma ratings.Conclusions: ChatGPT performed poorly in classifying glioma types but was good for adjuvant treatment recommendations as evaluated by CNS TB experts. Even though the ChatGPT lacks the precision to replace expert opinion, it may serve as a promising supplemental tool within a human-in-the-loop approach.\",\"PeriodicalId\":9050,\"journal\":{\"name\":\"BMJ Health & Care Informatics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2023-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/8e/7e/bmjhci-2023-100775.PMC10314415.pdf\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMJ Health & Care Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1136/bmjhci-2023-100775\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Health & Care Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1136/bmjhci-2023-100775","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 9

摘要

目的:评价ChatGPT在脑胶质瘤辅助治疗决策中的作用。方法:我们随机选择10例在本院中枢神经系统肿瘤委员会(CNS TB)讨论的脑胶质瘤患者。将患者的临床状况、手术结果、影像文本信息和免疫病理结果提供给ChatGPT V.3.5和7名中枢神经系统肿瘤专家。聊天机器人被要求在考虑患者功能状态的情况下给出辅助治疗选择和方案。专家们对基于人工智能的建议进行了评分，从0(完全不同意)到10(完全同意)。用类内相关系数一致性(ICC)来衡量类间一致性。结果:8例(80%)符合胶质母细胞瘤标准，2例(20%)为低级别胶质瘤。ChatGPT建议的专家评为质量作为诊断的贫穷(平均3,差1 - 7.8,ICC 0.9, 95%可信区间0.7到1.0),好的治疗建议(7位差6 - 8 ICC 0.8, 95%可信区间0.4到0.9),有利于治疗方案(7位差4 - 8 ICC 0.8, 95%可信区间0.5到0.9),适度的功能状态考虑(6位差1 - 7 ICC 0.7, 95%可信区间0.3到0.9)和温和的整体协议和建议(5位差3 - 7 ICC 0.7, 95%可信区间0.3到0.9)。胶质母细胞瘤和低级别胶质瘤分级之间没有差异。结论:ChatGPT在区分胶质瘤类型方面表现不佳，但根据CNS结核病专家的评估，ChatGPT在辅助治疗建议方面表现良好。尽管ChatGPT缺乏取代专家意见的精度，但它可以作为“人在循环”方法中有希望的补充工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

ChatGPT in glioma adjuvant therapy decision making: ready to assume the role of a doctor in the tumour board?

Objective: To evaluate ChatGPT's performance in brain glioma adjuvant therapy decision-making.

Methods: We randomly selected 10 patients with brain gliomas discussed at our institution's central nervous system tumour board (CNS TB). Patients' clinical status, surgical outcome, textual imaging information and immuno-pathology results were provided to ChatGPT V.3.5 and seven CNS tumour experts. The chatbot was asked to give the adjuvant treatment choice, and the regimen while considering the patient's functional status. The experts rated the artificial intelligence-based recommendations from 0 (complete disagreement) to 10 (complete agreement). An intraclass correlation coefficient agreement (ICC) was used to measure the inter-rater agreement.

Results: Eight patients (80%) met the criteria for glioblastoma and two (20%) were low-grade gliomas. The experts rated the quality of ChatGPT recommendations as poor for diagnosis (median 3, IQR 1-7.8, ICC 0.9, 95% CI 0.7 to 1.0), good for treatment recommendation (7, IQR 6-8, ICC 0.8, 95% CI 0.4 to 0.9), good for therapy regimen (7, IQR 4-8, ICC 0.8, 95% CI 0.5 to 0.9), moderate for functional status consideration (6, IQR 1-7, ICC 0.7, 95% CI 0.3 to 0.9) and moderate for overall agreement with the recommendations (5, IQR 3-7, ICC 0.7, 95% CI 0.3 to 0.9). No differences were observed between the glioblastomas and low-grade glioma ratings.

Conclusions: ChatGPT performed poorly in classifying glioma types but was good for adjuvant treatment recommendations as evaluated by CNS TB experts. Even though the ChatGPT lacks the precision to replace expert opinion, it may serve as a promising supplemental tool within a human-in-the-loop approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BMJ Health & Care Informatics Multiple-

CiteScore

6.10

自引率

4.90%

发文量

审稿时长

18 weeks