Educational Utility of Clinical Vignettes Generated in Japanese by ChatGPT-4: Mixed Methods Study.

IF 3.2 Q1 EDUCATION, SCIENTIFIC DISCIPLINES JMIR Medical Education Pub Date : 2024-08-13 DOI:10.2196/59133
Hiromizu Takahashi, Kiyoshi Shikino, Takeshi Kondo, Akira Komori, Yuji Yamada, Mizue Saita, Toshio Naito
{"title":"Educational Utility of Clinical Vignettes Generated in Japanese by ChatGPT-4: Mixed Methods Study.","authors":"Hiromizu Takahashi, Kiyoshi Shikino, Takeshi Kondo, Akira Komori, Yuji Yamada, Mizue Saita, Toshio Naito","doi":"10.2196/59133","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Evaluating the accuracy and educational utility of artificial intelligence-generated medical cases, especially those produced by large language models such as ChatGPT-4 (developed by OpenAI), is crucial yet underexplored.</p><p><strong>Objective: </strong>This study aimed to assess the educational utility of ChatGPT-4-generated clinical vignettes and their applicability in educational settings.</p><p><strong>Methods: </strong>Using a convergent mixed methods design, a web-based survey was conducted from January 8 to 28, 2024, to evaluate 18 medical cases generated by ChatGPT-4 in Japanese. In the survey, 6 main question items were used to evaluate the quality of the generated clinical vignettes and their educational utility, which are information quality, information accuracy, educational usefulness, clinical match, terminology accuracy (TA), and diagnosis difficulty. Feedback was solicited from physicians specializing in general internal medicine or general medicine and experienced in medical education. Chi-square and Mann-Whitney U tests were performed to identify differences among cases, and linear regression was used to examine trends associated with physicians' experience. Thematic analysis of qualitative feedback was performed to identify areas for improvement and confirm the educational utility of the cases.</p><p><strong>Results: </strong>Of the 73 invited participants, 71 (97%) responded. The respondents, primarily male (64/71, 90%), spanned a broad range of practice years (from 1976 to 2017) and represented diverse hospital sizes throughout Japan. The majority deemed the information quality (mean 0.77, 95% CI 0.75-0.79) and information accuracy (mean 0.68, 95% CI 0.65-0.71) to be satisfactory, with these responses being based on binary data. The average scores assigned were 3.55 (95% CI 3.49-3.60) for educational usefulness, 3.70 (95% CI 3.65-3.75) for clinical match, 3.49 (95% CI 3.44-3.55) for TA, and 2.34 (95% CI 2.28-2.40) for diagnosis difficulty, based on a 5-point Likert scale. Statistical analysis showed significant variability in content quality and relevance across the cases (P<.001 after Bonferroni correction). Participants suggested improvements in generating physical findings, using natural language, and enhancing medical TA. The thematic analysis highlighted the need for clearer documentation, clinical information consistency, content relevance, and patient-centered case presentations.</p><p><strong>Conclusions: </strong>ChatGPT-4-generated medical cases written in Japanese possess considerable potential as resources in medical education, with recognized adequacy in quality and accuracy. Nevertheless, there is a notable need for enhancements in the precision and realism of case details. This study emphasizes ChatGPT-4's value as an adjunctive educational tool in the medical field, requiring expert oversight for optimal application.</p>","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":null,"pages":null},"PeriodicalIF":3.2000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11350316/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/59133","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Evaluating the accuracy and educational utility of artificial intelligence-generated medical cases, especially those produced by large language models such as ChatGPT-4 (developed by OpenAI), is crucial yet underexplored.

Objective: This study aimed to assess the educational utility of ChatGPT-4-generated clinical vignettes and their applicability in educational settings.

Methods: Using a convergent mixed methods design, a web-based survey was conducted from January 8 to 28, 2024, to evaluate 18 medical cases generated by ChatGPT-4 in Japanese. In the survey, 6 main question items were used to evaluate the quality of the generated clinical vignettes and their educational utility, which are information quality, information accuracy, educational usefulness, clinical match, terminology accuracy (TA), and diagnosis difficulty. Feedback was solicited from physicians specializing in general internal medicine or general medicine and experienced in medical education. Chi-square and Mann-Whitney U tests were performed to identify differences among cases, and linear regression was used to examine trends associated with physicians' experience. Thematic analysis of qualitative feedback was performed to identify areas for improvement and confirm the educational utility of the cases.

Results: Of the 73 invited participants, 71 (97%) responded. The respondents, primarily male (64/71, 90%), spanned a broad range of practice years (from 1976 to 2017) and represented diverse hospital sizes throughout Japan. The majority deemed the information quality (mean 0.77, 95% CI 0.75-0.79) and information accuracy (mean 0.68, 95% CI 0.65-0.71) to be satisfactory, with these responses being based on binary data. The average scores assigned were 3.55 (95% CI 3.49-3.60) for educational usefulness, 3.70 (95% CI 3.65-3.75) for clinical match, 3.49 (95% CI 3.44-3.55) for TA, and 2.34 (95% CI 2.28-2.40) for diagnosis difficulty, based on a 5-point Likert scale. Statistical analysis showed significant variability in content quality and relevance across the cases (P<.001 after Bonferroni correction). Participants suggested improvements in generating physical findings, using natural language, and enhancing medical TA. The thematic analysis highlighted the need for clearer documentation, clinical information consistency, content relevance, and patient-centered case presentations.

Conclusions: ChatGPT-4-generated medical cases written in Japanese possess considerable potential as resources in medical education, with recognized adequacy in quality and accuracy. Nevertheless, there is a notable need for enhancements in the precision and realism of case details. This study emphasizes ChatGPT-4's value as an adjunctive educational tool in the medical field, requiring expert oversight for optimal application.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
由 ChatGPT-4 生成的日语临床小故事的教育效用:混合方法研究。
背景:评估人工智能生成的医学病例,尤其是由大型语言模型(如ChatGPT-4,由OpenAI开发)生成的医学病例的准确性和教育效用至关重要,但这方面的探索还不够:本研究旨在评估 ChatGPT-4 生成的临床案例的教育效用及其在教育环境中的适用性:方法:采用聚合混合方法设计,于 2024 年 1 月 8 日至 28 日开展了一项基于网络的调查,对 ChatGPT-4 生成的 18 个日语医疗案例进行了评估。调查中使用了 6 个主要问题项目来评估生成的临床案例的质量及其教育实用性,即信息质量、信息准确性、教育实用性、临床匹配性、术语准确性(TA)和诊断难度。我们向具有医学教育经验的普通内科或全科医生征求了反馈意见。采用卡方检验(Chi-square)和曼-惠特尼U检验(Mann-Whitney U)来确定不同病例之间的差异,并使用线性回归来研究与医生经验相关的趋势。对定性反馈进行了主题分析,以确定需要改进的地方,并确认案例的教育效用:在 73 名受邀参与者中,有 71 人(97%)做出了回应。受访者主要为男性(64/71,90%),执业年限跨度很大(从 1976 年到 2017 年),代表了日本各地不同规模的医院。大多数人认为信息质量(平均值 0.77,95% CI 0.75-0.79)和信息准确性(平均值 0.68,95% CI 0.65-0.71)令人满意,这些回答基于二进制数据。根据 5 点李克特量表,教育有用性的平均得分为 3.55(95% CI 3.49-3.60),临床匹配度为 3.70(95% CI 3.65-3.75),TA 为 3.49(95% CI 3.44-3.55),诊断难度为 2.34(95% CI 2.28-2.40)。统计分析表明,不同病例在内容质量和相关性方面存在明显差异(PC 结论:用日语编写的 ChatGPT-4 生成的医学病例在质量和准确性方面都得到了认可,具有作为医学教育资源的巨大潜力。然而,在病例细节的精确性和真实性方面仍有明显的改进需求。本研究强调了 ChatGPT-4 作为医学领域辅助教育工具的价值,需要专家的监督才能达到最佳应用效果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
JMIR Medical Education
JMIR Medical Education Social Sciences-Education
CiteScore
6.90
自引率
5.60%
发文量
54
审稿时长
8 weeks
期刊最新文献
Enhancing Medical Interview Skills Through AI-Simulated Patient Interactions: Nonrandomized Controlled Trial. Enhancing Digital Health Awareness and mHealth Competencies in Medical Education: Proof-of-Concept Study and Summative Process Evaluation of a Quality Improvement Project. Performance of ChatGPT in the In-Training Examination for Anesthesiology and Pain Medicine Residents in South Korea: Observational Study. Critical Analysis of ChatGPT 4 Omni in USMLE Disciplines, Clinical Clerkships, and Clinical Skills. Challenges and Needs in Digital Health Practice and Nursing Education Curricula: Gap Analysis Study.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1