Assessing the performance of ChatGPT in medical ethical decision-making: a comparative study with USMLE-based scenarios.

IF 3.4 2区哲学 Q1 ETHICS Journal of Medical Ethics Pub Date : 2025-09-22 DOI:10.1136/jme-2024-110240

Ali A Khan, Ali R Khan, Saminah Munshi, Hari Dandapani, Mohamed Jimale, Franck M Bogni, Hussain Khawaja

{"title":"Assessing the performance of ChatGPT in medical ethical decision-making: a comparative study with USMLE-based scenarios.","authors":"Ali A Khan, Ali R Khan, Saminah Munshi, Hari Dandapani, Mohamed Jimale, Franck M Bogni, Hussain Khawaja","doi":"10.1136/jme-2024-110240","DOIUrl":null,"url":null,"abstract":"Introduction: The integration of artificial intelligence (AI) into healthcare introduces innovative possibilities but raises ethical, legal and professional concerns. Assessing the performance of AI in core components of the United States Medical Licensing Examination (USMLE), such as communication skills, ethics, empathy and professionalism, is crucial. This study evaluates how well ChatGPT versions 3.5 and 4.0 handle complex medical scenarios using USMLE-Rx, AMBOSS and UWorld question banks, aiming to understand its ability to navigate patient interactions according to medical ethics and standards.Methods: We compiled 273 questions from AMBOSS, USMLE-Rx and UWorld, focusing on communication, social sciences, healthcare policy and ethics. GPT-3.5 and GPT-4 were tasked with answering and justifying their choices in new chat sessions to minimise model interference. Responses were compared against question bank rationales and average student performance to evaluate AI effectiveness in medical ethical decision-making.Results: GPT-3.5 answered 38.9% correctly in AMBOSS, 54.1% in USMLE-Rx and 57.4% in UWorld, with rationale accuracy rates of 83.3%, 90.0% and 87.0%, respectively. GPT-4 answered 75.9% correctly in AMBOSS, 64.9% in USMLE-Rx and 79.6% in UWorld, with rationale accuracy rates of 85.4%, 88.9%, and 98.8%, respectively. Both versions generally scored below average student performance, except GPT-4 in UWorld.Conclusion: ChatGPT, particularly version 4.0, shows potential in navigating ethical and interpersonal medical scenarios. However, human reasoning currently surpasses AI in average performance. Continued development and training of AI systems can enhance proficiency in these critical healthcare aspects.","PeriodicalId":16317,"journal":{"name":"Journal of Medical Ethics","volume":" ","pages":"693-699"},"PeriodicalIF":3.4000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Ethics","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1136/jme-2024-110240","RegionNum":2,"RegionCategory":"哲学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ETHICS","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: The integration of artificial intelligence (AI) into healthcare introduces innovative possibilities but raises ethical, legal and professional concerns. Assessing the performance of AI in core components of the United States Medical Licensing Examination (USMLE), such as communication skills, ethics, empathy and professionalism, is crucial. This study evaluates how well ChatGPT versions 3.5 and 4.0 handle complex medical scenarios using USMLE-Rx, AMBOSS and UWorld question banks, aiming to understand its ability to navigate patient interactions according to medical ethics and standards.

Methods: We compiled 273 questions from AMBOSS, USMLE-Rx and UWorld, focusing on communication, social sciences, healthcare policy and ethics. GPT-3.5 and GPT-4 were tasked with answering and justifying their choices in new chat sessions to minimise model interference. Responses were compared against question bank rationales and average student performance to evaluate AI effectiveness in medical ethical decision-making.

Results: GPT-3.5 answered 38.9% correctly in AMBOSS, 54.1% in USMLE-Rx and 57.4% in UWorld, with rationale accuracy rates of 83.3%, 90.0% and 87.0%, respectively. GPT-4 answered 75.9% correctly in AMBOSS, 64.9% in USMLE-Rx and 79.6% in UWorld, with rationale accuracy rates of 85.4%, 88.9%, and 98.8%, respectively. Both versions generally scored below average student performance, except GPT-4 in UWorld.

Conclusion: ChatGPT, particularly version 4.0, shows potential in navigating ethical and interpersonal medical scenarios. However, human reasoning currently surpasses AI in average performance. Continued development and training of AI systems can enhance proficiency in these critical healthcare aspects.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

评估ChatGPT在医学伦理决策中的表现：与基于usmle的场景的比较研究

导言：人工智能（AI）与医疗保健的整合带来了创新的可能性，但也引发了道德、法律和专业方面的担忧。评估人工智能在美国医疗执照考试（USMLE）核心部分的表现至关重要，例如沟通技巧、道德、同理心和专业精神。本研究使用USMLE-Rx、AMBOSS和UWorld题库评估ChatGPT 3.5和4.0版本处理复杂医疗场景的能力，旨在了解其根据医学伦理和标准导航患者交互的能力。方法：收集来自AMBOSS、USMLE-Rx和UWorld的273个问题，重点关注传播学、社会科学、医疗政策和伦理。GPT-3.5和GPT-4的任务是在新的聊天会话中回答和证明他们的选择，以尽量减少模型干扰。将回答与题库的基本原理和学生的平均表现进行比较，以评估人工智能在医学伦理决策中的有效性。结果：GPT-3.5在AMBOSS中正确率为38.9%，在USMLE-Rx中正确率为54.1%，在UWorld中正确率为57.4%，基本原理正确率分别为83.3%、90.0%和87.0%。GPT-4在AMBOSS中的正确率为75.9%，在USMLE-Rx中的正确率为64.9%，在UWorld中的正确率为79.6%，基本原理正确率分别为85.4%、88.9%和98.8%。除了UWorld的GPT-4之外，这两个版本的得分都低于学生的平均水平。结论：ChatGPT，特别是4.0版本，显示出在伦理和人际医疗场景导航方面的潜力。然而，目前人类推理的平均表现超过了人工智能。人工智能系统的持续开发和培训可以提高这些关键医疗保健方面的熟练程度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Medical Ethics 医学-医学：伦理

CiteScore

7.80

自引率

9.80%

发文量

164

审稿时长

4-8 weeks

期刊介绍： Journal of Medical Ethics is a leading international journal that reflects the whole field of medical ethics. The journal seeks to promote ethical reflection and conduct in scientific research and medical practice. It features articles on various ethical aspects of health care relevant to health care professionals, members of clinical ethics committees, medical ethics professionals, researchers and bioscientists, policy makers and patients. Subscribers to the Journal of Medical Ethics also receive Medical Humanities journal at no extra cost. JME is the official journal of the Institute of Medical Ethics.

期刊最新文献

Restraint in somatic healthcare: how should it be regulated? The misplaced embryo: legal parenthood in 'embryo mix-up' cases. Extending patient-centred communication to non-speaking intellectually disabled persons. Asymmetry endures: a response to Holt. Family consent to deceased organ donation in China: a participatory qualitative study.