Unlabelled: As artificial intelligence (AI) develops, the medical education community has begun defining the relevant forms of competency. Many experts emphasize the importance of optimizing AI tools' output or understanding the relevant technical and normative considerations around using AI tools. A recent publication in this journal showed that optimizing instructions for large language models may yield diminishing returns as such tools improve. This suggests the need for a new competency-one that focuses on choosing the appropriate AI tools. I briefly summarize the current competency domains and examples to contextualize the current state of AI competency development, highlighting the need for further synthesis. I then introduce a hierarchical framework of competencies that might assist with priority-setting around subsequent competency development work. It consists of cognitive, operational, and meta-AI domains, which respectively correspond with the knowledge around understanding, using, and choosing AI tools. The final section describes the potential challenges associated with the development of AI competency. These include traditional concerns around competency-based medical education: deciding whether and which competencies are meaningful for measuring the targets of interest; adjusting the relevant measurements to reflect the necessary temporal and institutional context; and setting up the relevant organizational support to encourage measurement of competency. This section also discusses the challenges of developing the relevant performance indicators for AI tools across different clinical contexts. Such indicators will be necessary for guiding the choice of AI tools for the clinical context, but medical educators may not have the skills to develop them. In addition to identifying potential sources for relevant indicators, the medical education community may shape physicians' norms of practice to drive the AI industry into producing the relevant indicators. The potential for physicians to incur higher medical liability from poor choice of AI may lead them to demand more nuanced performance indicators from AI suppliers. Physicians are also in a position to do so, since the competitive AI market may provide them more bargaining power.
{"title":"AI Competency: Current State and Challenges.","authors":"Sian Tsuei","doi":"10.2196/86686","DOIUrl":"10.2196/86686","url":null,"abstract":"<p><strong>Unlabelled: </strong>As artificial intelligence (AI) develops, the medical education community has begun defining the relevant forms of competency. Many experts emphasize the importance of optimizing AI tools' output or understanding the relevant technical and normative considerations around using AI tools. A recent publication in this journal showed that optimizing instructions for large language models may yield diminishing returns as such tools improve. This suggests the need for a new competency-one that focuses on choosing the appropriate AI tools. I briefly summarize the current competency domains and examples to contextualize the current state of AI competency development, highlighting the need for further synthesis. I then introduce a hierarchical framework of competencies that might assist with priority-setting around subsequent competency development work. It consists of cognitive, operational, and meta-AI domains, which respectively correspond with the knowledge around understanding, using, and choosing AI tools. The final section describes the potential challenges associated with the development of AI competency. These include traditional concerns around competency-based medical education: deciding whether and which competencies are meaningful for measuring the targets of interest; adjusting the relevant measurements to reflect the necessary temporal and institutional context; and setting up the relevant organizational support to encourage measurement of competency. This section also discusses the challenges of developing the relevant performance indicators for AI tools across different clinical contexts. Such indicators will be necessary for guiding the choice of AI tools for the clinical context, but medical educators may not have the skills to develop them. In addition to identifying potential sources for relevant indicators, the medical education community may shape physicians' norms of practice to drive the AI industry into producing the relevant indicators. The potential for physicians to incur higher medical liability from poor choice of AI may lead them to demand more nuanced performance indicators from AI suppliers. Physicians are also in a position to do so, since the competitive AI market may provide them more bargaining power.</p>","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"12 ","pages":"e86686"},"PeriodicalIF":3.2,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12978887/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147436045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: In the field of team-based care, pharmacists are vital for optimizing medication therapy. However, many medical professionals lack the opportunity to learn how to propose prescription changes with precision.
Objective: This study aimed to address this knowledge gap by developing and assessing a new educational program for pharmacy students focused on prescription support and interprofessional collaboration.
Methods: We recruited 191 fifth-year pharmaceutical students during the 2022-2024 academic years. The program featured a 7-day intensive curriculum that included learning how to assist with prescriptions, analyzing clinical data, and engaging in role-playing exercises. A web-based questionnaire and a paper test were used to evaluate students' awareness and knowledge both before and after the program. Statistical analyses were performed to verify the significance of changes; we utilized the Wilcoxon signed-rank test for the ordinal data derived from the specific behavioral objectives and 2-tailed paired t tests for the interval data from the knowledge tests. The magnitude of change was quantified using r for Wilcoxon tests and Cohen dz for 2-tailed t tests, with 95% CI calculated to ensure the stability and reliability of the observed results.
Results: Analysis of the primary outcome specific behavioral objectives revealed statistically significant effects across all items (Wilcoxon signed-rank test; P<.001). Effect sizes (r=0.505-0.835) ranged from moderate to large, with particularly large effects observed in identifying contents issue (r=0.835, 95% CI 0.126-0.330; P<.001). Knowledge test scores showed significant improvement in the following 3 subjects: pharmacology (r=-0.504, 95% CI -0.215 to 0.127; P<.001), organic chemistry (r=0.254, 95% CI -0.148 to -0.193; P=.004), and communication (r=0.221, 95% CI -0.151 to -0.190; P=.01). No significant changes were observed in pathology or pharmacokinetics.
Conclusions: This program provides strong evidence that practical, hands-on learning with hospital pharmacists helps improve pharmacy students' professional skills and optimize pharmaceutical therapies in interprofessional care. By teaching pharmacists to effectively propose prescription changes, the program equips them to become integral members of interprofessional care, ultimately leading to optimized pharmaceutical care for patients.
{"title":"Prescription Support Practice for Pharmacy Students: Pre-Post Educational Intervention Study.","authors":"Fuka Aizawa, Kenta Yagi, Tsukasa Higashionna, Hirofumi Hamano, Shimon Takahashi, Yoshito Zamami, Kazuaki Shinomiya, Takahiro Niimura, Mitsuhiro Goda, Kei Kawada, Keisuke Ishizawa","doi":"10.2196/79545","DOIUrl":"10.2196/79545","url":null,"abstract":"<p><strong>Background: </strong>In the field of team-based care, pharmacists are vital for optimizing medication therapy. However, many medical professionals lack the opportunity to learn how to propose prescription changes with precision.</p><p><strong>Objective: </strong>This study aimed to address this knowledge gap by developing and assessing a new educational program for pharmacy students focused on prescription support and interprofessional collaboration.</p><p><strong>Methods: </strong>We recruited 191 fifth-year pharmaceutical students during the 2022-2024 academic years. The program featured a 7-day intensive curriculum that included learning how to assist with prescriptions, analyzing clinical data, and engaging in role-playing exercises. A web-based questionnaire and a paper test were used to evaluate students' awareness and knowledge both before and after the program. Statistical analyses were performed to verify the significance of changes; we utilized the Wilcoxon signed-rank test for the ordinal data derived from the specific behavioral objectives and 2-tailed paired t tests for the interval data from the knowledge tests. The magnitude of change was quantified using r for Wilcoxon tests and Cohen dz for 2-tailed t tests, with 95% CI calculated to ensure the stability and reliability of the observed results.</p><p><strong>Results: </strong>Analysis of the primary outcome specific behavioral objectives revealed statistically significant effects across all items (Wilcoxon signed-rank test; P<.001). Effect sizes (r=0.505-0.835) ranged from moderate to large, with particularly large effects observed in identifying contents issue (r=0.835, 95% CI 0.126-0.330; P<.001). Knowledge test scores showed significant improvement in the following 3 subjects: pharmacology (r=-0.504, 95% CI -0.215 to 0.127; P<.001), organic chemistry (r=0.254, 95% CI -0.148 to -0.193; P=.004), and communication (r=0.221, 95% CI -0.151 to -0.190; P=.01). No significant changes were observed in pathology or pharmacokinetics.</p><p><strong>Conclusions: </strong>This program provides strong evidence that practical, hands-on learning with hospital pharmacists helps improve pharmacy students' professional skills and optimize pharmaceutical therapies in interprofessional care. By teaching pharmacists to effectively propose prescription changes, the program equips them to become integral members of interprofessional care, ultimately leading to optimized pharmaceutical care for patients.</p>","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"12 ","pages":"e79545"},"PeriodicalIF":3.2,"publicationDate":"2026-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12954723/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147345448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
<p><strong>Background: </strong>With the rapid development of artificial intelligence technology, artificial intelligence-generated content (AIGC) is increasingly widely applied in the field of medical education. Large language models, such as ChatGPT, are a prominent type of AIGC technology. Critical thinking is a core ability in medical education, but the impact of AIGC technology on the critical thinking ability of medical students remains unclear. Medical students are at a crucial stage in cultivating critical thinking, and the intervention of AIGC technology may have a profound impact on this process.</p><p><strong>Objective: </strong>This study aims to systematically review the impact of AIGC technology on the complex mechanisms affecting medical students' critical thinking abilities and build a corresponding strategic framework. The findings are intended to provide theoretical support and practical guidance for applying AIGC in medical education.</p><p><strong>Methods: </strong>This study followed 2020 PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, with the retrieval scope limited to English studies published between November 2022 and June 2025. Through the PubMed database, combined with the search methods of subject terms and free words, relevant studies involving the impact of AIGC on the critical thinking of medical students were screened for using keywords such as "AIGC," "medical students," and "critical thinking." Two independent reviewers screened and evaluated the literature, and ultimately conducted a qualitative analysis based on the common themes extracted from the literature.</p><p><strong>Results: </strong>AIGC technology in medical education is 2-fold. First, AIGC's powerful information capabilities provide abundant learning resources and efficient tools. This accelerates knowledge acquisition and broadens learning scope. Second, overreliance on AIGC may lead to mental inertia, weaken critical thinking skills, and cause academic integrity issues among students. Research has found that strategies such as customized AIGC tools, virtual standardized patients, new models of resource integration, and proactive assessment of AI limitations can effectively make up for the deficiencies of AIGC in cultivating high-level critical thinking, helping medical students maintain and enhance their critical thinking and problem-solving abilities.</p><p><strong>Conclusions: </strong>AIGC technology application in medical education needs to carefully weigh the pros and cons. By optimizing the design and usage of AIGC tools and combining them with the guidance and supervision of educators, they can be transformed into powerful tools for promoting the development of critical thinking among medical students. Future research should further expand the scope of study, optimize research methods, pay attention to individual differences, track long-term effects, and deeply explore the influence of ethical and cult
{"title":"Application of AI-Generated Content in Medical Education: Systematic Review of the Impact on Critical Thinking Abilities of Medical Students.","authors":"Jinlei Li, Fen Ai, Jueyan Wang, Bingxin Cheng, Yu Li, Zhen Chen","doi":"10.2196/79939","DOIUrl":"10.2196/79939","url":null,"abstract":"<p><strong>Background: </strong>With the rapid development of artificial intelligence technology, artificial intelligence-generated content (AIGC) is increasingly widely applied in the field of medical education. Large language models, such as ChatGPT, are a prominent type of AIGC technology. Critical thinking is a core ability in medical education, but the impact of AIGC technology on the critical thinking ability of medical students remains unclear. Medical students are at a crucial stage in cultivating critical thinking, and the intervention of AIGC technology may have a profound impact on this process.</p><p><strong>Objective: </strong>This study aims to systematically review the impact of AIGC technology on the complex mechanisms affecting medical students' critical thinking abilities and build a corresponding strategic framework. The findings are intended to provide theoretical support and practical guidance for applying AIGC in medical education.</p><p><strong>Methods: </strong>This study followed 2020 PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, with the retrieval scope limited to English studies published between November 2022 and June 2025. Through the PubMed database, combined with the search methods of subject terms and free words, relevant studies involving the impact of AIGC on the critical thinking of medical students were screened for using keywords such as \"AIGC,\" \"medical students,\" and \"critical thinking.\" Two independent reviewers screened and evaluated the literature, and ultimately conducted a qualitative analysis based on the common themes extracted from the literature.</p><p><strong>Results: </strong>AIGC technology in medical education is 2-fold. First, AIGC's powerful information capabilities provide abundant learning resources and efficient tools. This accelerates knowledge acquisition and broadens learning scope. Second, overreliance on AIGC may lead to mental inertia, weaken critical thinking skills, and cause academic integrity issues among students. Research has found that strategies such as customized AIGC tools, virtual standardized patients, new models of resource integration, and proactive assessment of AI limitations can effectively make up for the deficiencies of AIGC in cultivating high-level critical thinking, helping medical students maintain and enhance their critical thinking and problem-solving abilities.</p><p><strong>Conclusions: </strong>AIGC technology application in medical education needs to carefully weigh the pros and cons. By optimizing the design and usage of AIGC tools and combining them with the guidance and supervision of educators, they can be transformed into powerful tools for promoting the development of critical thinking among medical students. Future research should further expand the scope of study, optimize research methods, pay attention to individual differences, track long-term effects, and deeply explore the influence of ethical and cult","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"12 ","pages":"e79939"},"PeriodicalIF":3.2,"publicationDate":"2026-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12954715/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147345398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Liu, Yiying Zhu, Weishan Zhang, Xian Lu, Liping Wu, Minghui Yue, Oudong Xia, Chujun Shi
<p><strong>Background: </strong>Medical history-taking is a core clinical skill; yet, traditional teaching methods face challenges. We developed an artificial intelligence-powered medical history-taking training and evaluation system (AMTES) and established its technical feasibility as an extracurricular resource. Evidence on whether such tools improve learning outcomes when voluntarily embedded in routine curricula remains limited.</p><p><strong>Objective: </strong>This study aimed to evaluate the real-world educational effectiveness of AMTES as an opt-in extracurricular tool and examine whether learning gains vary by practice patterns and baseline academic ability.</p><p><strong>Methods: </strong>We conducted a retrospective cohort study of the 2024-2025 Diagnostics course cohort (N=478) at Shantou University Medical College, China, using total population sampling. Students were categorized as AMTES users (n=205, 42.9%; ≥1 sessions) and nonusers (n=273, 57.1%) based on their voluntary extracurricular adoption of the system during the month preceding a high-stakes final practical skills examination. To address selection bias, we performed 1:1 propensity score matching via logistic regression using age, sex, and 3 previous academic scores as covariates. The average treatment effect on the treated for final examination score (0-70) was estimated with paired t tests, and robustness to unobserved confounding was assessed via Rosenbaum sensitivity analysis. Among matched users, practice patterns were identified using K-means clustering on log-derived features, with cluster differences compared using Mann-Whitney U tests. Subsequently, we explored aptitude-treatment interaction by testing the interaction between practice intensity and baseline ability using linear and logistic regression models.</p><p><strong>Results: </strong>Propensity score matching yielded 157 matched pairs (n=314) with excellent covariate balance (|standardized mean difference|<0.1). In the matched cohort, the users outperformed nonusers by 3% (average treatment effect on the treated=2.09, 95% CI 0.75-3.42; P=.002). This finding was robust to weak unmeasured confounding (Rosenbaum Γ=1.23). Among users (N=157), cluster analysis of usage logs revealed a low-intensity group (74/157, 47.1%) and a high-intensity group (83/157, 52.9%). The 2 groups reflected differences in both practice quantity and quality. However, the added efforts did not translate into higher scores (mean difference=1.6 points, 95% CI -0.5 to 3.6) or excellence probability (risk difference=7.7 percentage points, 95% CI -5.0 to 20.5). Exploratory aptitude-treatment interaction analyses suggested ability-dependent effects for excellence rate (β<sub>3</sub>=1.461; P=.04) and marginally for final score (β<sub>3</sub>=2.58; P=.07), but not for pass rate (P=.94).</p><p><strong>Conclusions: </strong>Building upon previous technical validation, this study contributes real-world effectiveness evidence by evaluating AMTES a
{"title":"Real-World Impact and Educational Effectiveness of an AI-Powered Medical History-Taking System: Retrospective Propensity Score-Matched Cohort Study.","authors":"Yang Liu, Yiying Zhu, Weishan Zhang, Xian Lu, Liping Wu, Minghui Yue, Oudong Xia, Chujun Shi","doi":"10.2196/89367","DOIUrl":"10.2196/89367","url":null,"abstract":"<p><strong>Background: </strong>Medical history-taking is a core clinical skill; yet, traditional teaching methods face challenges. We developed an artificial intelligence-powered medical history-taking training and evaluation system (AMTES) and established its technical feasibility as an extracurricular resource. Evidence on whether such tools improve learning outcomes when voluntarily embedded in routine curricula remains limited.</p><p><strong>Objective: </strong>This study aimed to evaluate the real-world educational effectiveness of AMTES as an opt-in extracurricular tool and examine whether learning gains vary by practice patterns and baseline academic ability.</p><p><strong>Methods: </strong>We conducted a retrospective cohort study of the 2024-2025 Diagnostics course cohort (N=478) at Shantou University Medical College, China, using total population sampling. Students were categorized as AMTES users (n=205, 42.9%; ≥1 sessions) and nonusers (n=273, 57.1%) based on their voluntary extracurricular adoption of the system during the month preceding a high-stakes final practical skills examination. To address selection bias, we performed 1:1 propensity score matching via logistic regression using age, sex, and 3 previous academic scores as covariates. The average treatment effect on the treated for final examination score (0-70) was estimated with paired t tests, and robustness to unobserved confounding was assessed via Rosenbaum sensitivity analysis. Among matched users, practice patterns were identified using K-means clustering on log-derived features, with cluster differences compared using Mann-Whitney U tests. Subsequently, we explored aptitude-treatment interaction by testing the interaction between practice intensity and baseline ability using linear and logistic regression models.</p><p><strong>Results: </strong>Propensity score matching yielded 157 matched pairs (n=314) with excellent covariate balance (|standardized mean difference|<0.1). In the matched cohort, the users outperformed nonusers by 3% (average treatment effect on the treated=2.09, 95% CI 0.75-3.42; P=.002). This finding was robust to weak unmeasured confounding (Rosenbaum Γ=1.23). Among users (N=157), cluster analysis of usage logs revealed a low-intensity group (74/157, 47.1%) and a high-intensity group (83/157, 52.9%). The 2 groups reflected differences in both practice quantity and quality. However, the added efforts did not translate into higher scores (mean difference=1.6 points, 95% CI -0.5 to 3.6) or excellence probability (risk difference=7.7 percentage points, 95% CI -5.0 to 20.5). Exploratory aptitude-treatment interaction analyses suggested ability-dependent effects for excellence rate (β<sub>3</sub>=1.461; P=.04) and marginally for final score (β<sub>3</sub>=2.58; P=.07), but not for pass rate (P=.94).</p><p><strong>Conclusions: </strong>Building upon previous technical validation, this study contributes real-world effectiveness evidence by evaluating AMTES a","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"12 ","pages":"e89367"},"PeriodicalIF":3.2,"publicationDate":"2026-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12976603/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147285519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Unlabelled: Simulation has become an essential pedagogical tool in health professions education, traditionally valued for its ability to approximate clinical practice. Higher simulation fidelity is often assumed to directly enhance learner engagement and improve educational outcomes; however, this assumption oversimplifies a complex relationship. Fidelity is multidimensional, encompassing physical, emotional, and contextual dimensions, as well as qualitative and quantitative considerations, each influencing learners' perception of realism in distinct ways. Engagement is shaped not only by these dimensions of fidelity but also by intrinsic factors such as motivation, prior experience, stress, and emotional resilience, and by extrinsic factors including instructional design, facilitation, debriefing, and psychological safety. A central mediator in this process is the fiction contract, an implicit agreement that enables learners to suspend disbelief and engage authentically despite inherent limitations in realism. Technological sophistication alone does not necessarily translate into greater educational impact. Rather, fidelity should be intentionally aligned with learning objectives: advanced patient simulators may support procedural training, standardized patients may enhance communication skills, and task trainers may optimize focused psychomotor practice. This viewpoint advocates for a goal-oriented, multimodal approach that redefines high-fidelity simulation not as the pursuit of maximum realism, but as the deliberate alignment of fidelity with pedagogy to optimize learner engagement and educational effectiveness.
{"title":"From Realism to Learner Engagement: Rethinking Fidelity in Simulation-Based Education.","authors":"Julien Pico, Jean-Noel Evain, Christina Aron, Gilles Martin, Ilian Cruz-Panesso, Leonida-Mihai Georgescu, Issam Tanoubi","doi":"10.2196/84684","DOIUrl":"10.2196/84684","url":null,"abstract":"<p><strong>Unlabelled: </strong>Simulation has become an essential pedagogical tool in health professions education, traditionally valued for its ability to approximate clinical practice. Higher simulation fidelity is often assumed to directly enhance learner engagement and improve educational outcomes; however, this assumption oversimplifies a complex relationship. Fidelity is multidimensional, encompassing physical, emotional, and contextual dimensions, as well as qualitative and quantitative considerations, each influencing learners' perception of realism in distinct ways. Engagement is shaped not only by these dimensions of fidelity but also by intrinsic factors such as motivation, prior experience, stress, and emotional resilience, and by extrinsic factors including instructional design, facilitation, debriefing, and psychological safety. A central mediator in this process is the fiction contract, an implicit agreement that enables learners to suspend disbelief and engage authentically despite inherent limitations in realism. Technological sophistication alone does not necessarily translate into greater educational impact. Rather, fidelity should be intentionally aligned with learning objectives: advanced patient simulators may support procedural training, standardized patients may enhance communication skills, and task trainers may optimize focused psychomotor practice. This viewpoint advocates for a goal-oriented, multimodal approach that redefines high-fidelity simulation not as the pursuit of maximum realism, but as the deliberate alignment of fidelity with pedagogy to optimize learner engagement and educational effectiveness.</p>","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"12 ","pages":"e84684"},"PeriodicalIF":3.2,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12928686/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147277252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xu Zhang, Xianying He, Yuntian Chu, Dongqing Liu, Minzhao Lyu, Weiyi Wang, Haotian Chen, Meihao Ji, Fangfang Cui, Jie Zhao
<p><strong>Background: </strong>As an emerging delivery mode of education, online continuing medical education (CME) increases the accessibility of high-quality medical training for professionals and students in China. Guoyuan (meaning "nationwide" in Chinese) is an online CME platform delivered via a mobile app and operated by the National Telemedicine Center of China since 2018, serving as an illustrative case of mobile online CME implementation.</p><p><strong>Objective: </strong>We identified trends in the adoption and usage of the Guoyuan mobile online CME platform from 2018 to 2023 and provided evidence for the application and optimization of online CME.</p><p><strong>Methods: </strong>We analyzed yearly usage data of the Guoyuan mobile app (The First Affiliated Hospital of Zhengzhou University) in 2018-2023 and collected surveys on the satisfaction and recognition of competency enhancement in online CME in each connected hospital in 2023. Using the IBM SPSS, the nonparametric Kruskal-Wallis H test was used to compare attendance across different disciplines, followed by post hoc pairwise comparisons for course types with significant differences and ordinal logistic regression analysis to examine factors influencing satisfaction with the online CME system and perceived competency enhancement among invited doctors.</p><p><strong>Results: </strong>From 2018 to 2023, Guoyuan had 94,537 registered trainees, 1672 published course videos, and 1,878,437 attendances. Attendance was higher for courses in ophthalmology, otolaryngology, and pathology than in other disciplines (median attendance 610, IQR 105-2055 vs 283, IQR 106-690 participants). Based on a sample size of 245 participants, ordinal regression analysis showed that discipline category, professional title, and working years significantly influenced satisfaction. General practitioners showed lower overall satisfaction than internal medicine doctors (odds ratio [OR] 0.323, 95% CI 0.110-0.948; OR 0.251, 95% CI 0.087-0.729; and OR 0.196, 95% CI 0.066-0.585; P=.04; P=.01; P=.003). Junior titles reported higher audio-visual clarity (OR 3.151, 95% CI 1.178-8.427; P=.02) and process satisfaction (OR 4.939, 95% CI 1.674-14.576; P=.004). More experienced doctors had higher system usability (OR 1.102, 95% CI 1.012-1.200; P=.03) and process satisfaction (OR 1.141, 95% CI 1.044-1.247; P=.003). Recognition of online CME's benefits was influenced by multiple factors. Greater clinical experience positively predicted recognition of clinical use (OR 1.106, 95% CI 1.004-1.218; P=.04), while an inverse association was observed with age (OR 0.894, 95% CI 0.802-0.996; P=.04). For research-related benefits, positive predictors included discipline category in obstetrics and gynecology compared to internal medicine (OR 6.217, 95% CI 1.236-31.258; P=.03) and junior professional title (OR 3.791, 95% CI 1.231-11.673; P=.02), whereas intensive care unit was a negative predictor compared to internal medicine (OR 0.111,
背景:作为一种新兴的教育交付模式,在线继续医学教育(CME)增加了中国专业人员和学生获得高质量医学培训的可及性。国远是中国国家远程医疗中心自2018年起运营的移动端在线CME平台,是中国移动端在线CME实施的示范案例。目的:了解2018 - 2023年国远移动在线CME平台的采用和使用趋势,为在线CME的应用和优化提供依据。方法:分析2018-2023年郑州大学第一附属医院国远移动app的年度使用数据,收集各连接医院2023年对在线继续教育能力提升的满意度和认知度调查。采用IBM SPSS软件,采用非参数Kruskal-Wallis H检验比较不同学科的出诊情况,随后对具有显著差异的课程类型进行事后两两比较,并对受邀医生对在线继续教育系统满意度和感知能力提升的影响因素进行有序逻辑回归分析。结果:2018年至2023年,国元注册学员94537人,发布课程视频1672段,听课人数1878437人次。眼科、耳鼻喉科和病理学课程的出勤率高于其他学科(平均出勤率为610人,IQR 105-2055人对283人,IQR 106-690人)。基于245名参与者的样本,有序回归分析显示学科类别、职称和工作年限对满意度有显著影响。全科医生的总体满意度低于内科医生(比值比[OR] 0.323, 95% CI 0.110-0.948; OR 0.251, 95% CI 0.087-0.729; OR 0.196, 95% CI 0.066-0.585; P= 0.04; P= 0.01; P= 0.003)。初级职称报告更高的视听清晰度(OR 3.151, 95% CI 1.178-8.427; P= 0.02)和过程满意度(OR 4.939, 95% CI 1.674-14.576; P= 0.004)。经验丰富的医生有更高的系统可用性(OR 1.102, 95% CI 1.012-1.200; P=.03)和流程满意度(OR 1.141, 95% CI 1.044-1.247; P=.003)。对在线CME收益的认识受到多种因素的影响。较高的临床经验与临床用药认知呈正相关(OR 1.106, 95% CI 1.004-1.218; P= 0.04),与年龄呈负相关(OR 0.894, 95% CI 0.802-0.996; P= 0.04)。对于与研究相关的获益,阳性预测因子包括与内科相比,妇产科的学科类别(OR 6.217, 95% CI 1.236-31.258; P= 0.03)和初级职称(OR 3.791, 95% CI 1.231-11.673; P= 0.02),而与内科相比,重症监护病房是一个阴性预测因子(OR 0.111, 95% CI 0.014-0.893; P= 0.04)。结论:在线移动CME平台在中国的医疗专业人员中得到了广泛采用,特别是在COVID-19爆发后。然而,课程可用性和用户体验方面的学科差异仍然存在,这表明需要进一步优化课程设计和软件交互。
{"title":"The Design and Evaluation of an Online Continuing Medical Education App for Medical Professionals in China: Quantitative Study.","authors":"Xu Zhang, Xianying He, Yuntian Chu, Dongqing Liu, Minzhao Lyu, Weiyi Wang, Haotian Chen, Meihao Ji, Fangfang Cui, Jie Zhao","doi":"10.2196/76299","DOIUrl":"10.2196/76299","url":null,"abstract":"<p><strong>Background: </strong>As an emerging delivery mode of education, online continuing medical education (CME) increases the accessibility of high-quality medical training for professionals and students in China. Guoyuan (meaning \"nationwide\" in Chinese) is an online CME platform delivered via a mobile app and operated by the National Telemedicine Center of China since 2018, serving as an illustrative case of mobile online CME implementation.</p><p><strong>Objective: </strong>We identified trends in the adoption and usage of the Guoyuan mobile online CME platform from 2018 to 2023 and provided evidence for the application and optimization of online CME.</p><p><strong>Methods: </strong>We analyzed yearly usage data of the Guoyuan mobile app (The First Affiliated Hospital of Zhengzhou University) in 2018-2023 and collected surveys on the satisfaction and recognition of competency enhancement in online CME in each connected hospital in 2023. Using the IBM SPSS, the nonparametric Kruskal-Wallis H test was used to compare attendance across different disciplines, followed by post hoc pairwise comparisons for course types with significant differences and ordinal logistic regression analysis to examine factors influencing satisfaction with the online CME system and perceived competency enhancement among invited doctors.</p><p><strong>Results: </strong>From 2018 to 2023, Guoyuan had 94,537 registered trainees, 1672 published course videos, and 1,878,437 attendances. Attendance was higher for courses in ophthalmology, otolaryngology, and pathology than in other disciplines (median attendance 610, IQR 105-2055 vs 283, IQR 106-690 participants). Based on a sample size of 245 participants, ordinal regression analysis showed that discipline category, professional title, and working years significantly influenced satisfaction. General practitioners showed lower overall satisfaction than internal medicine doctors (odds ratio [OR] 0.323, 95% CI 0.110-0.948; OR 0.251, 95% CI 0.087-0.729; and OR 0.196, 95% CI 0.066-0.585; P=.04; P=.01; P=.003). Junior titles reported higher audio-visual clarity (OR 3.151, 95% CI 1.178-8.427; P=.02) and process satisfaction (OR 4.939, 95% CI 1.674-14.576; P=.004). More experienced doctors had higher system usability (OR 1.102, 95% CI 1.012-1.200; P=.03) and process satisfaction (OR 1.141, 95% CI 1.044-1.247; P=.003). Recognition of online CME's benefits was influenced by multiple factors. Greater clinical experience positively predicted recognition of clinical use (OR 1.106, 95% CI 1.004-1.218; P=.04), while an inverse association was observed with age (OR 0.894, 95% CI 0.802-0.996; P=.04). For research-related benefits, positive predictors included discipline category in obstetrics and gynecology compared to internal medicine (OR 6.217, 95% CI 1.236-31.258; P=.03) and junior professional title (OR 3.791, 95% CI 1.231-11.673; P=.02), whereas intensive care unit was a negative predictor compared to internal medicine (OR 0.111, ","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"12 ","pages":"e76299"},"PeriodicalIF":3.2,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12928682/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147277341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Camille Rolland-Debord, Lucien Juret, Mathilde Simon, Abdallah El Mouhajer, Cécile Londner, Capucine Morélot-Panzini, Cécile Chenivesse, Thomas Similowski
Background: Radial artery puncture is a common clinical procedure essential for assessing gas exchange but is frequently perceived as stressful by inexperienced operators, who fear causing pain to their patients. Despite its practical relevance, formal training in this procedure is inconsistently integrated into medical curricula. This study evaluated whether a structured training program-combining theoretical instruction, simulation-based practice, and debriefing-could influence students' procedural confidence and decision-making and patient experience during their first clinical arterial puncture.
Objective: This study aimed to determine whether structured simulation-based training influences medical students' anxiety, confidence, and technical performance and patient experience during their first arterial puncture.
Methods: Third-year medical students who had never performed an arterial puncture were assigned to 1 of 2 groups: a structured training group (group 1) or a control group receiving informal or no specific training (group 2). After performing their first arterial puncture under supervision, students completed a questionnaire assessing apprehension, satisfaction, and confidence. The decision to use local anesthesia, puncture success, and patient-rated pain and apprehension were also recorded. A total of 67 students participated (group 1: n=24, 35.8%; group 2: n=43, 64.2%), with 61 patients included. Statistical comparisons were performed using the Fisher exact and nonparametric Mann-Whitney U tests (α=.05).
Results: Self-reported apprehension and confidence were similar between groups. However, group 1 students were significantly less likely to use local anesthesia compared to group 2 students (7/20, 35% vs 28/36, 77.8%, respectively; P=.003), suggesting greater procedural confidence. First-attempt success rates were comparable (group 1: 3/13, 23.1%; group 2: 14/29, 48.3%; P=.18). Median patient-reported pain scores were numerically but not statistically significantly lower when anesthesia was used (2.1, IQR 1.2-4.0 vs 4.8, IQR 2.1-6.4; P=.08).
Conclusions: Structured training influenced students' behavior during their first arterial puncture, reducing reliance on anesthesia despite similar levels of self-reported apprehension. Although confidence ratings did not differ, behavioral indicators suggested improved self-efficacy and readiness for clinical performance. These findings support the behavioral impact of structured procedural education and call for future research using validated assessment tools and long-term follow-up.
背景:桡动脉穿刺是评估气体交换的常见临床操作,但经常被缺乏经验的操作人员认为是有压力的,他们害怕给病人造成疼痛。尽管这一程序具有实际意义,但在医学课程中不一贯地纳入这一程序的正式培训。本研究评估了一个结构化的训练计划——结合理论指导、模拟实践和汇报——是否能影响学生在第一次临床动脉穿刺时的程序信心、决策和患者体验。目的:本研究旨在确定结构化模拟训练是否影响医学生在第一次动脉穿刺时的焦虑、信心、技术表现和患者体验。方法:将从未进行过动脉穿刺的三年级医学生分为两组中的一组:结构化训练组(1组)和对照组(2组),对照组接受非正式或无特殊训练。在监督下进行第一次动脉穿刺后,学生们完成了一份评估恐惧、满意度和信心的问卷。同时记录局部麻醉的决定、穿刺成功率以及患者的疼痛和忧虑程度。共67名学生参与研究(第一组n=24,占35.8%;第二组n=43,占64.2%),共纳入61例患者。采用Fisher精确检验和非参数Mann-Whitney U检验进行统计学比较(α= 0.05)。结果:两组间自我报告的忧虑和信心相似。然而,与2组学生相比,1组学生使用局麻的可能性明显降低(7/ 20,35% vs 28/ 36,77.8%; P= 0.003),表明更大的程序置信度。首次尝试成功率具有可比性(1组:3/13,23.1%;2组:14/29,48.3%;P= 0.18)。使用麻醉时,患者报告的疼痛评分中位数在数值上较低,但在统计学上无显著差异(2.1,IQR 1.2-4.0 vs 4.8, IQR 2.1-6.4; P= 0.08)。结论:结构化训练影响了学生在第一次动脉穿刺时的行为,减少了对麻醉的依赖,尽管自我报告的恐惧水平相似。虽然信心评级没有差异,但行为指标表明自我效能和临床表现的准备程度有所提高。这些发现支持结构化程序性教育的行为影响,并呼吁未来使用有效的评估工具和长期随访进行研究。
{"title":"Impact of a Structured Training Program on Medical Student Confidence and Behavior During Their First Radial Arterial Puncture: Comparative Study.","authors":"Camille Rolland-Debord, Lucien Juret, Mathilde Simon, Abdallah El Mouhajer, Cécile Londner, Capucine Morélot-Panzini, Cécile Chenivesse, Thomas Similowski","doi":"10.2196/78086","DOIUrl":"10.2196/78086","url":null,"abstract":"<p><strong>Background: </strong>Radial artery puncture is a common clinical procedure essential for assessing gas exchange but is frequently perceived as stressful by inexperienced operators, who fear causing pain to their patients. Despite its practical relevance, formal training in this procedure is inconsistently integrated into medical curricula. This study evaluated whether a structured training program-combining theoretical instruction, simulation-based practice, and debriefing-could influence students' procedural confidence and decision-making and patient experience during their first clinical arterial puncture.</p><p><strong>Objective: </strong>This study aimed to determine whether structured simulation-based training influences medical students' anxiety, confidence, and technical performance and patient experience during their first arterial puncture.</p><p><strong>Methods: </strong>Third-year medical students who had never performed an arterial puncture were assigned to 1 of 2 groups: a structured training group (group 1) or a control group receiving informal or no specific training (group 2). After performing their first arterial puncture under supervision, students completed a questionnaire assessing apprehension, satisfaction, and confidence. The decision to use local anesthesia, puncture success, and patient-rated pain and apprehension were also recorded. A total of 67 students participated (group 1: n=24, 35.8%; group 2: n=43, 64.2%), with 61 patients included. Statistical comparisons were performed using the Fisher exact and nonparametric Mann-Whitney U tests (α=.05).</p><p><strong>Results: </strong>Self-reported apprehension and confidence were similar between groups. However, group 1 students were significantly less likely to use local anesthesia compared to group 2 students (7/20, 35% vs 28/36, 77.8%, respectively; P=.003), suggesting greater procedural confidence. First-attempt success rates were comparable (group 1: 3/13, 23.1%; group 2: 14/29, 48.3%; P=.18). Median patient-reported pain scores were numerically but not statistically significantly lower when anesthesia was used (2.1, IQR 1.2-4.0 vs 4.8, IQR 2.1-6.4; P=.08).</p><p><strong>Conclusions: </strong>Structured training influenced students' behavior during their first arterial puncture, reducing reliance on anesthesia despite similar levels of self-reported apprehension. Although confidence ratings did not differ, behavioral indicators suggested improved self-efficacy and readiness for clinical performance. These findings support the behavioral impact of structured procedural education and call for future research using validated assessment tools and long-term follow-up.</p>","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"12 ","pages":"e78086"},"PeriodicalIF":3.2,"publicationDate":"2026-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12916088/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146221081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Juan S Izquierdo-Condoy, Marlon Arias-Intriago, Laura Montero Corrales, Esteban Ortiz-Prado
Unlabelled: Artificial intelligence (AI) is increasingly influencing medical education by enabling adaptive learning, AI-assisted assessment, and scalable instructional tools. Natural language processing, machine learning, and generative large language models offer innovative ways to support teaching and learning, yet their integration raises ethical, pedagogical, and infrastructural challenges. This viewpoint article aims to examine the current applications, benefits, and challenges of AI in medical education and propose strategies for responsible and effective integration. AI tools such as chatbots, virtual patients, and intelligent tutoring systems enhance personalized and immersive learning. Automated grading and predictive analytics support efficient evaluations, while AI-assisted writing tools streamline content creation. Despite these advances, concerns persist around data privacy, algorithmic bias, unequal access, and diminished critical thinking. Key solutions include AI literacy training, data oversight, equitable infrastructure, and curriculum reform. The FACETS framework offers 6 dimensions (ie, form, application, context, instructional mode, technology, and the SAMR [substitution, augmentation, modification, redefinition model]) to evaluate AI integration effectively. AI offers substantial opportunities to transform medical education, but its adoption must be ethical, equitable, and pedagogically grounded. Strategic frameworks such as FACETS, combined with institutional governance and cross-sector collaboration, are essential to guide implementation so that AI enhances learning outcomes while preserving the humanistic foundations of medical practice.
{"title":"Artificial Intelligence in Medical Education: Transformative Potential, Current Applications, and Future Implications.","authors":"Juan S Izquierdo-Condoy, Marlon Arias-Intriago, Laura Montero Corrales, Esteban Ortiz-Prado","doi":"10.2196/77127","DOIUrl":"10.2196/77127","url":null,"abstract":"<p><strong>Unlabelled: </strong>Artificial intelligence (AI) is increasingly influencing medical education by enabling adaptive learning, AI-assisted assessment, and scalable instructional tools. Natural language processing, machine learning, and generative large language models offer innovative ways to support teaching and learning, yet their integration raises ethical, pedagogical, and infrastructural challenges. This viewpoint article aims to examine the current applications, benefits, and challenges of AI in medical education and propose strategies for responsible and effective integration. AI tools such as chatbots, virtual patients, and intelligent tutoring systems enhance personalized and immersive learning. Automated grading and predictive analytics support efficient evaluations, while AI-assisted writing tools streamline content creation. Despite these advances, concerns persist around data privacy, algorithmic bias, unequal access, and diminished critical thinking. Key solutions include AI literacy training, data oversight, equitable infrastructure, and curriculum reform. The FACETS framework offers 6 dimensions (ie, form, application, context, instructional mode, technology, and the SAMR [substitution, augmentation, modification, redefinition model]) to evaluate AI integration effectively. AI offers substantial opportunities to transform medical education, but its adoption must be ethical, equitable, and pedagogically grounded. Strategic frameworks such as FACETS, combined with institutional governance and cross-sector collaboration, are essential to guide implementation so that AI enhances learning outcomes while preserving the humanistic foundations of medical practice.</p>","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"12 ","pages":"e77127"},"PeriodicalIF":3.2,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12912660/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146214420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Federica Zavattaro, Clara-Maria Barth, Caroline Brall, Viktor von Wyl, Felix Gille
Unlabelled: Trust is increasingly recognized as a cornerstone for the successful implementation of digital public health initiatives, from mobile apps to the use of artificial intelligence in medicine, yet it remains underrepresented in educational curricula. In the course of our research and teaching activities in the field of trust in digital public health and medicine, we identified a gap in existing educational resources that aimed at supporting students in conducting structured trust analyses. Digitalization introduces new complexities into trust relationships, as interactions become increasingly mediated by digital tools. Preparing future professionals, therefore, demands fostering a critical understanding of how trust operates within digital systems, especially in the health sector. To address this gap, we developed and tested the first Trust Analysis Canvas for Teaching (TACT), a tool designed to guide students in conducting trust analyses of case studies in digital public health and medicine. Grounded in conceptual research on trust in health systems and health data sharing, we (1) developed the canvas content and reviewed it with two trust researchers; (2) tested and iteratively refined the tool with 23 students (3 BSc, 14 MSc, and 6 PhD) from diverse disciplines and academic levels through in-person and online focus groups at the universities of Zurich and Bern; (3) collaborated with a graphic designer to optimize its visual layout; and (4) translated the final canvas into French, Italian, German, and Spanish to ensure accessibility across disciplines, academic levels, and languages while maintaining a clear and engaging visual design. This paper introduces TACT, a canvas comprising 16 guiding questions organized around 6 core dimensions, designed to enable students from diverse disciplinary backgrounds and academic levels to engage with the complex concept of trust in a structured and guided manner, thereby addressing the identified gap in the current curricula. We outline the development process and provide a practical, step-by-step tutorial demonstrating its application through a written trust analysis of a digital health case study, supported by references to relevant literature.
{"title":"Trust Analysis Canvas for Teaching in the Field of Digital Public Health and Medicine: Tutorial.","authors":"Federica Zavattaro, Clara-Maria Barth, Caroline Brall, Viktor von Wyl, Felix Gille","doi":"10.2196/79709","DOIUrl":"10.2196/79709","url":null,"abstract":"<p><strong>Unlabelled: </strong>Trust is increasingly recognized as a cornerstone for the successful implementation of digital public health initiatives, from mobile apps to the use of artificial intelligence in medicine, yet it remains underrepresented in educational curricula. In the course of our research and teaching activities in the field of trust in digital public health and medicine, we identified a gap in existing educational resources that aimed at supporting students in conducting structured trust analyses. Digitalization introduces new complexities into trust relationships, as interactions become increasingly mediated by digital tools. Preparing future professionals, therefore, demands fostering a critical understanding of how trust operates within digital systems, especially in the health sector. To address this gap, we developed and tested the first Trust Analysis Canvas for Teaching (TACT), a tool designed to guide students in conducting trust analyses of case studies in digital public health and medicine. Grounded in conceptual research on trust in health systems and health data sharing, we (1) developed the canvas content and reviewed it with two trust researchers; (2) tested and iteratively refined the tool with 23 students (3 BSc, 14 MSc, and 6 PhD) from diverse disciplines and academic levels through in-person and online focus groups at the universities of Zurich and Bern; (3) collaborated with a graphic designer to optimize its visual layout; and (4) translated the final canvas into French, Italian, German, and Spanish to ensure accessibility across disciplines, academic levels, and languages while maintaining a clear and engaging visual design. This paper introduces TACT, a canvas comprising 16 guiding questions organized around 6 core dimensions, designed to enable students from diverse disciplinary backgrounds and academic levels to engage with the complex concept of trust in a structured and guided manner, thereby addressing the identified gap in the current curricula. We outline the development process and provide a practical, step-by-step tutorial demonstrating its application through a written trust analysis of a digital health case study, supported by references to relevant literature.</p>","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"12 ","pages":"e79709"},"PeriodicalIF":3.2,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12912458/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146214431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
<p><strong>Background: </strong>Generative artificial intelligence (AI) is increasingly used in medical education, including AI-based virtual patients to improve interview skills. However, how much AI-based assessment (ABA) differs from human-based assessment (HBA) remains unclear.</p><p><strong>Objective: </strong>This study aimed to compare the quality of clinical interview assessments generated via an ABA (GPT-o1 Pro [ABA-o1] and GPT-5 Pro [ABA-5]) with those generated via an HBA conducted by clinical instructors in an AI-based virtual patient setting. We also examined whether AI reduced evaluation time and assessed agreement across participants with different levels of clinical experience.</p><p><strong>Methods: </strong>A standardized case of leg weakness was implemented in an AI-based virtual patient. Seven participants (2 medical students, 3 residents, and 2 attending physicians) each conducted an interview with the AI patient, and transcripts were scored using the 25-item Master Interview Rating Scale (0-125). Three evaluation strategies were compared. First, GPT-o1 Pro and GPT-5 Pro scored each transcript 5 times with different random seeds to test case specificity. Processing time was logged automatically. Second, 5 blinded clinical instructors independently rated each transcript once using the same rubric. Third, reliability metrics were applied. For AI, intraclass correlation coefficients (ICCs) quantified repeatability. For humans, the ICC(2,1) was calculated. Agreement was quantified using the Pearson r, Lin concordance correlation coefficient, Bland-Altman limits of agreement, Cronbach α, and ICC. Time efficiency was expressed as mean minutes per transcript and relative percentage reduction.</p><p><strong>Results: </strong>Mean interview scores were similar across methods (ABA-o1: mean 52.1, SD 6.9; ABA-5: mean 53.2, SD 6.8; HBA: mean 53.7, SD 6.8). Agreement between ABA and HBA was strong (r=0.90; concordance correlation coefficient=0.88) with minimal bias (ABA-o1: mean 0.4, SD 2.7; ABA-5: mean 1.5, SD 5.2; limits of agreement: -4.9 to 5.7 for ABA-o1 and -8.6 to 11.7 for ABA-5). The Cronbach α was 0.81 (ABA-o1), 0.86 (ABA-5), and 0.80 (HBA); the ICC(3,1) was 0.77 (ABA-o1) and 0.82 (ABA-5); and the ICC(2,1) was 0.38 (HBA). The coefficient of variation for ABA was approximately half that of HBA (6.6% vs 13.9%). Processing time for 5 runs was 4 minutes, 19 seconds for ABA-o1 and 3 minutes, 20 seconds for ABA-5 vs 10 minutes, 16 seconds for physicians, corresponding to 58% and 67.6% reductions, respectively.</p><p><strong>Conclusions: </strong>ABA-o1 and ABA-5 produced scores closely matching HBA while demonstrating superior consistency and reliability. In the setting of virtual interview transcripts, these findings suggest that ABA may serve as a valid, rapid, and scalable alternative to HBA, reducing per-assessment time by over half. Applied strategically, AI-based scoring could enable timely feedback, improve efficiency, and reduce
{"title":"AI- vs Human-Based Assessment of Medical Interview Transcripts in a Generative AI-Simulated Patient System: Cross-Sectional Validation Study.","authors":"Hiromizu Takahashi, Kiyoshi Shikino, Takeshi Kondo, Yuji Yamada, Yoshitaka Tomoda, Minoru Kishi, Yuki Aiyama, Sho Nagai, Akiko Enomoto, Yoshinori Tokushima, Takahiro Shinohara, Fumiaki Sano, Takeshi Matsuura, Rikiya Watanabe, Toshio Naito","doi":"10.2196/81673","DOIUrl":"10.2196/81673","url":null,"abstract":"<p><strong>Background: </strong>Generative artificial intelligence (AI) is increasingly used in medical education, including AI-based virtual patients to improve interview skills. However, how much AI-based assessment (ABA) differs from human-based assessment (HBA) remains unclear.</p><p><strong>Objective: </strong>This study aimed to compare the quality of clinical interview assessments generated via an ABA (GPT-o1 Pro [ABA-o1] and GPT-5 Pro [ABA-5]) with those generated via an HBA conducted by clinical instructors in an AI-based virtual patient setting. We also examined whether AI reduced evaluation time and assessed agreement across participants with different levels of clinical experience.</p><p><strong>Methods: </strong>A standardized case of leg weakness was implemented in an AI-based virtual patient. Seven participants (2 medical students, 3 residents, and 2 attending physicians) each conducted an interview with the AI patient, and transcripts were scored using the 25-item Master Interview Rating Scale (0-125). Three evaluation strategies were compared. First, GPT-o1 Pro and GPT-5 Pro scored each transcript 5 times with different random seeds to test case specificity. Processing time was logged automatically. Second, 5 blinded clinical instructors independently rated each transcript once using the same rubric. Third, reliability metrics were applied. For AI, intraclass correlation coefficients (ICCs) quantified repeatability. For humans, the ICC(2,1) was calculated. Agreement was quantified using the Pearson r, Lin concordance correlation coefficient, Bland-Altman limits of agreement, Cronbach α, and ICC. Time efficiency was expressed as mean minutes per transcript and relative percentage reduction.</p><p><strong>Results: </strong>Mean interview scores were similar across methods (ABA-o1: mean 52.1, SD 6.9; ABA-5: mean 53.2, SD 6.8; HBA: mean 53.7, SD 6.8). Agreement between ABA and HBA was strong (r=0.90; concordance correlation coefficient=0.88) with minimal bias (ABA-o1: mean 0.4, SD 2.7; ABA-5: mean 1.5, SD 5.2; limits of agreement: -4.9 to 5.7 for ABA-o1 and -8.6 to 11.7 for ABA-5). The Cronbach α was 0.81 (ABA-o1), 0.86 (ABA-5), and 0.80 (HBA); the ICC(3,1) was 0.77 (ABA-o1) and 0.82 (ABA-5); and the ICC(2,1) was 0.38 (HBA). The coefficient of variation for ABA was approximately half that of HBA (6.6% vs 13.9%). Processing time for 5 runs was 4 minutes, 19 seconds for ABA-o1 and 3 minutes, 20 seconds for ABA-5 vs 10 minutes, 16 seconds for physicians, corresponding to 58% and 67.6% reductions, respectively.</p><p><strong>Conclusions: </strong>ABA-o1 and ABA-5 produced scores closely matching HBA while demonstrating superior consistency and reliability. In the setting of virtual interview transcripts, these findings suggest that ABA may serve as a valid, rapid, and scalable alternative to HBA, reducing per-assessment time by over half. Applied strategically, AI-based scoring could enable timely feedback, improve efficiency, and reduce","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"12 ","pages":"e81673"},"PeriodicalIF":3.2,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12912650/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146214342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}