{"title":"A randomized controlled trial on evaluating clinician-supervised generative AI for decision support","authors":"Rayan Ebnali Harari , Abdullah Altaweel , Tareq Ahram , Madeleine Keehner , Hamid Shokoohi","doi":"10.1016/j.ijmedinf.2024.105701","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>The integration of generative artificial intelligence (AI) as clinical decision support systems (CDSS) into telemedicine presents a significant opportunity to enhance clinical outcomes, yet its application remains underexplored.</div></div><div><h3>Objective</h3><div>This study investigates the efficacy of one of the most common generative AI tools, ChatGPT, for providing clinical guidance during cardiac arrest scenarios.</div></div><div><h3>Methods</h3><div>We examined the performance, cognitive load, and trust associated with traditional methods (paper guide), autonomous ChatGPT, and clinician-supervised ChatGPT, where a clinician supervised the AI recommendations. Fifty-four subjects without medical backgrounds participated in randomized controlled trials, each assigned to one of three intervention groups: paper guide, ChatGPT, or supervised ChatGPT. Participants completed a standardized CPR scenario using an Augmented Reality (AR) headset, and performance, physiological, and self-reported metrics were recorded.</div></div><div><h3>Main Findings</h3><div>Results indicate that the Supervised-ChatGPT group showed significantly higher decision accuracy compared to the paper guide and ChatGPT groups, although the scenario completion time was longer. Physiological data showed a reduced LF/HF ratio in the Supervised-ChatGPT group, suggesting potentially lower cognitive load. Trust in AI was also highest in the supervised condition. In one instance, ChatGPT suggested a risky option, highlighting the need for clinician supervision.</div></div><div><h3>Conclusion</h3><div>Our findings highlight the potential of supervised generative AI to enhance decision-making accuracy and user trust in emergency healthcare settings, despite trade-offs with response time. The study underscores the importance of clinician oversight and the need for further refinement of AI systems to improve safety. Future research should explore strategies to optimize AI supervision and assess the implementation of these systems in real-world clinical settings.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"195 ","pages":"Article 105701"},"PeriodicalIF":3.7000,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505624003642","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Background
The integration of generative artificial intelligence (AI) as clinical decision support systems (CDSS) into telemedicine presents a significant opportunity to enhance clinical outcomes, yet its application remains underexplored.
Objective
This study investigates the efficacy of one of the most common generative AI tools, ChatGPT, for providing clinical guidance during cardiac arrest scenarios.
Methods
We examined the performance, cognitive load, and trust associated with traditional methods (paper guide), autonomous ChatGPT, and clinician-supervised ChatGPT, where a clinician supervised the AI recommendations. Fifty-four subjects without medical backgrounds participated in randomized controlled trials, each assigned to one of three intervention groups: paper guide, ChatGPT, or supervised ChatGPT. Participants completed a standardized CPR scenario using an Augmented Reality (AR) headset, and performance, physiological, and self-reported metrics were recorded.
Main Findings
Results indicate that the Supervised-ChatGPT group showed significantly higher decision accuracy compared to the paper guide and ChatGPT groups, although the scenario completion time was longer. Physiological data showed a reduced LF/HF ratio in the Supervised-ChatGPT group, suggesting potentially lower cognitive load. Trust in AI was also highest in the supervised condition. In one instance, ChatGPT suggested a risky option, highlighting the need for clinician supervision.
Conclusion
Our findings highlight the potential of supervised generative AI to enhance decision-making accuracy and user trust in emergency healthcare settings, despite trade-offs with response time. The study underscores the importance of clinician oversight and the need for further refinement of AI systems to improve safety. Future research should explore strategies to optimize AI supervision and assess the implementation of these systems in real-world clinical settings.
期刊介绍:
International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings.
The scope of journal covers:
Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.;
Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc.
Educational computer based programs pertaining to medical informatics or medicine in general;
Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.