Background: Traditional Chinese medicine (TCM) hasbeen widely used to treat various diseases in China for thousands of years and has shown satisfactory effectiveness. However, many surveys found that TCM receives little recognition from Western medicine (WM) physicians and students. At present, TCM is offered as a compulsory course for WM students in WM schools.
Objective: This study aimed to investigate whether TCM courses can affect the WM students' attitude toward TCM.
Methods: WM students from Xiangya Medical School were invited to completeaweb-based questionnaire before and immediately after a TCM course. Their attitude toward TCM and treatment preferences for different kinds of diseases were tested. The Attitude Scale of TCM (ASTCM) was used. The main part of the ASTCM was designed to measure the attitude of medical students towardTCM. It consisted of 18 items, divided into cognitive dimension (5 terms), emotional dimension (8 terms), and behavioral tendencyfactor (5 terms).
Results: Finally, the results of 118 five-year program (FYP) and 36 eight-year program (EYP) students were included. For FYP students, there was a significant increase in the total mean score (66.42, SD 7.66 vs 71.43, SD 7.38;P<.001) of ASTCM after the TCM course. Significant increases in mean scores of the 3 factors of attitude (cognition: 21.64, SD 2.08 vs 22.90, SD 1.94; affection: 25.21, SD 4.39 vs 27.96, SD 4.4; and behavioral tendency: 19.577, SD 3.02 vs 20.58, SD 2.76; P<.001)were also observed. Except for the score of behavioral tendency (17.50, SD 3.54 vs 18.78, SD 3.22; P=.02), a significant increase was not detected in total score, cognition, and affection in EPY students (total score: mean 60.36, SD 10.53 vs mean 62.92, SD 10.05; cognition: mean 20.50, SD 2.73 vs mean 20.69, SD 2.73; and affection: mean 22.36, SD 6.32 vs mean 23.44, SD 5.84; all P>.05). The treatment preference of FYP students in acute (P=.02), chronic (P=.003), and physical diseases (P=.02) showed remarkable change. A major change was also detected in internal diseases (P=.02), surgical diseases (perioperative period; P=.01), and mental illnesses (P=.02) in EYP students. This change mainly appeared as a decline in WM preference and an increase in TCM and WM preference.
Conclusions: The study showed that earlier exposure to the TCM course increased the positive attitude toward TCM in students majoring in WM. The results provide some suggestions for arraging TCM courses in WM schools.
{"title":"The Effect of a Traditional Chinese Medicine Course on Western Medicine Students' Attitudes Toward Traditional Chinese Medicine: Self-Controlled Pre-Post Questionnaire Study.","authors":"Haoyu He, Hanjin Cui, Li Guo, Yanhui Liao","doi":"10.2196/55972","DOIUrl":"https://doi.org/10.2196/55972","url":null,"abstract":"<p><strong>Background: </strong>Traditional Chinese medicine (TCM) hasbeen widely used to treat various diseases in China for thousands of years and has shown satisfactory effectiveness. However, many surveys found that TCM receives little recognition from Western medicine (WM) physicians and students. At present, TCM is offered as a compulsory course for WM students in WM schools.</p><p><strong>Objective: </strong>This study aimed to investigate whether TCM courses can affect the WM students' attitude toward TCM.</p><p><strong>Methods: </strong>WM students from Xiangya Medical School were invited to completeaweb-based questionnaire before and immediately after a TCM course. Their attitude toward TCM and treatment preferences for different kinds of diseases were tested. The Attitude Scale of TCM (ASTCM) was used. The main part of the ASTCM was designed to measure the attitude of medical students towardTCM. It consisted of 18 items, divided into cognitive dimension (5 terms), emotional dimension (8 terms), and behavioral tendencyfactor (5 terms).</p><p><strong>Results: </strong>Finally, the results of 118 five-year program (FYP) and 36 eight-year program (EYP) students were included. For FYP students, there was a significant increase in the total mean score (66.42, SD 7.66 vs 71.43, SD 7.38;P<.001) of ASTCM after the TCM course. Significant increases in mean scores of the 3 factors of attitude (cognition: 21.64, SD 2.08 vs 22.90, SD 1.94; affection: 25.21, SD 4.39 vs 27.96, SD 4.4; and behavioral tendency: 19.577, SD 3.02 vs 20.58, SD 2.76; P<.001)were also observed. Except for the score of behavioral tendency (17.50, SD 3.54 vs 18.78, SD 3.22; P=.02), a significant increase was not detected in total score, cognition, and affection in EPY students (total score: mean 60.36, SD 10.53 vs mean 62.92, SD 10.05; cognition: mean 20.50, SD 2.73 vs mean 20.69, SD 2.73; and affection: mean 22.36, SD 6.32 vs mean 23.44, SD 5.84; all P>.05). The treatment preference of FYP students in acute (P=.02), chronic (P=.003), and physical diseases (P=.02) showed remarkable change. A major change was also detected in internal diseases (P=.02), surgical diseases (perioperative period; P=.01), and mental illnesses (P=.02) in EYP students. This change mainly appeared as a decline in WM preference and an increase in TCM and WM preference.</p><p><strong>Conclusions: </strong>The study showed that earlier exposure to the TCM course increased the positive attitude toward TCM in students majoring in WM. The results provide some suggestions for arraging TCM courses in WM schools.</p>","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"12 ","pages":"e55972"},"PeriodicalIF":3.2,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145991326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hayoung K Donnelly, David Mandell, Sy Hwang, Emily Schriver, Ugurcan Vurgun, Graydon Neill, Esha Patel, Megan E Reilly, Michael Steinberg, Amber Calloway, Robert Gallop, Maria A Oquendo, Gregory K Brown, Danielle L Mowery
<p><strong>Background: </strong>The use of artificial intelligence (AI) to analyze health care data has become common in behavioral health sciences. However, the lack of training opportunities for mental health professionals limits clinicians' ability to adopt AI in clinical settings. AI education is essential for trainees, equipping them with the literacy needed to implement AI tools in practice, collaborate effectively with data scientists, and develop skills as interdisciplinary researchers with computing skills.</p><p><strong>Objective: </strong>As part of the Penn Innovation in Suicide Prevention Implementation Research Center, we developed, implemented, and evaluated a virtual workshop to educate psychiatry and psychology trainees on using AI for suicide prevention research.</p><p><strong>Methods: </strong>The workshop introduced trainees to natural language processing (NLP) concepts and Python coding skills using Jupyter notebooks within a secure Microsoft Azure Databricks cloud computing and analytics environment. We designed a 3-hour workshop that covered 4 key NLP topics: data characterization, data standardization, concept extraction, and statistical analysis. To demonstrate real-world applications, we processed chief complaints from electronic health records to compare the prevalence of suicide-related encounters across populations by race, ethnicity, and age. Training materials were developed based on standard NLP techniques and domain-specific tasks, such as preprocessing psychiatry-related acronyms. Two researchers drafted and demonstrated the code, incorporating feedback from the Methods Core of the Innovation in Suicide Prevention Implementation Research to refine the materials. To evaluate the effectiveness of the workshop, we used the Kirkpatrick program evaluation model, focusing on participants' reactions (level 1) and learning outcomes (level 2). Confidence changes in knowledge and skills before and after the workshop were assessed using paired t tests, and open-ended questions were included to gather feedback for future improvements.</p><p><strong>Results: </strong>A total of 10 trainees participated in the workshop virtually, including residents, postdoctoral researchers, and graduate students from the psychiatry and psychology departments. The participants found the workshop helpful (mean 3.17 on a scale of 1-4, SD 0.41). Their overall confidence in NLP knowledge significantly increased (P=.002) from 1.35 (SD 0.47) to 2.79 (SD 0.46). Confidence in coding abilities also improved significantly (P=.01), increasing from 1.33 (SD 0.60) to 2.25 (SD 0.42). Open-ended feedback suggested incorporating thematic analysis and exploring additional datasets for future workshops.</p><p><strong>Conclusions: </strong>This study illustrates the effectiveness of a tailored data science workshop for trainees in psychiatry and psychology, focusing on applying NLP techniques for suicide prevention research. The workshop significantly enhanced
{"title":"Data Science Education for Residents, Researchers, and Students in Psychiatry and Psychology: Program Development and Evaluation Study.","authors":"Hayoung K Donnelly, David Mandell, Sy Hwang, Emily Schriver, Ugurcan Vurgun, Graydon Neill, Esha Patel, Megan E Reilly, Michael Steinberg, Amber Calloway, Robert Gallop, Maria A Oquendo, Gregory K Brown, Danielle L Mowery","doi":"10.2196/75125","DOIUrl":"https://doi.org/10.2196/75125","url":null,"abstract":"<p><strong>Background: </strong>The use of artificial intelligence (AI) to analyze health care data has become common in behavioral health sciences. However, the lack of training opportunities for mental health professionals limits clinicians' ability to adopt AI in clinical settings. AI education is essential for trainees, equipping them with the literacy needed to implement AI tools in practice, collaborate effectively with data scientists, and develop skills as interdisciplinary researchers with computing skills.</p><p><strong>Objective: </strong>As part of the Penn Innovation in Suicide Prevention Implementation Research Center, we developed, implemented, and evaluated a virtual workshop to educate psychiatry and psychology trainees on using AI for suicide prevention research.</p><p><strong>Methods: </strong>The workshop introduced trainees to natural language processing (NLP) concepts and Python coding skills using Jupyter notebooks within a secure Microsoft Azure Databricks cloud computing and analytics environment. We designed a 3-hour workshop that covered 4 key NLP topics: data characterization, data standardization, concept extraction, and statistical analysis. To demonstrate real-world applications, we processed chief complaints from electronic health records to compare the prevalence of suicide-related encounters across populations by race, ethnicity, and age. Training materials were developed based on standard NLP techniques and domain-specific tasks, such as preprocessing psychiatry-related acronyms. Two researchers drafted and demonstrated the code, incorporating feedback from the Methods Core of the Innovation in Suicide Prevention Implementation Research to refine the materials. To evaluate the effectiveness of the workshop, we used the Kirkpatrick program evaluation model, focusing on participants' reactions (level 1) and learning outcomes (level 2). Confidence changes in knowledge and skills before and after the workshop were assessed using paired t tests, and open-ended questions were included to gather feedback for future improvements.</p><p><strong>Results: </strong>A total of 10 trainees participated in the workshop virtually, including residents, postdoctoral researchers, and graduate students from the psychiatry and psychology departments. The participants found the workshop helpful (mean 3.17 on a scale of 1-4, SD 0.41). Their overall confidence in NLP knowledge significantly increased (P=.002) from 1.35 (SD 0.47) to 2.79 (SD 0.46). Confidence in coding abilities also improved significantly (P=.01), increasing from 1.33 (SD 0.60) to 2.25 (SD 0.42). Open-ended feedback suggested incorporating thematic analysis and exploring additional datasets for future workshops.</p><p><strong>Conclusions: </strong>This study illustrates the effectiveness of a tailored data science workshop for trainees in psychiatry and psychology, focusing on applying NLP techniques for suicide prevention research. The workshop significantly enhanced ","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"12 ","pages":"e75125"},"PeriodicalIF":3.2,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145991329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brandon C J Cheah, Shefaly Shorey, Jun Hong Ch'ng, Chee Wah Tan
Unlabelled: This paper proposes a framework for leveraging large language models (LLMs) to generate misconceptions as a tool for collaborative learning in health care education. While misconceptions-particularly those generated by AI-are often viewed as detrimental to learning, we present an alternative perspective: that LLM-generated misconceptions, when addressed through structured peer discussion, can promote conceptual change and critical thinking. The paper outlines use cases across health care disciplines, including both clinical and basic science contexts, and a practical 10-step guidance for educators to implement the framework. It also highlights the need for medium- to long-term research to evaluate the impact of LLM-supported learning on student outcomes. This framework may support health care educators globally in integrating emerging AI technologies into their teaching, regardless of the disciplinary focus.
{"title":"Implementing Large Language Models to Support Misconception-Based Collaborative Learning in Health Care Education.","authors":"Brandon C J Cheah, Shefaly Shorey, Jun Hong Ch'ng, Chee Wah Tan","doi":"10.2196/81875","DOIUrl":"https://doi.org/10.2196/81875","url":null,"abstract":"<p><strong>Unlabelled: </strong>This paper proposes a framework for leveraging large language models (LLMs) to generate misconceptions as a tool for collaborative learning in health care education. While misconceptions-particularly those generated by AI-are often viewed as detrimental to learning, we present an alternative perspective: that LLM-generated misconceptions, when addressed through structured peer discussion, can promote conceptual change and critical thinking. The paper outlines use cases across health care disciplines, including both clinical and basic science contexts, and a practical 10-step guidance for educators to implement the framework. It also highlights the need for medium- to long-term research to evaluate the impact of LLM-supported learning on student outcomes. This framework may support health care educators globally in integrating emerging AI technologies into their teaching, regardless of the disciplinary focus.</p>","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"12 ","pages":"e81875"},"PeriodicalIF":3.2,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145991349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Objective Structured Clinical Examinations (OSCEs) are used as an evaluation method in medical education, but require significant pedagogical expertise and investment, especially in emerging fields like digital health. Large language models (LLMs), such as ChatGPT (OpenAI), have shown potential in automating educational content generation. However, OSCE generation using LLMs remains underexplored.
Objective: This study aims to evaluate 3 GPT-4o configurations for generating OSCE stations in digital health: (1) standard GPT with a simple prompt and OSCE guidelines; (2) personalized GPT with a simple prompt, OSCE guidelines, and a reference book in digital health; and (3) simulated-agents GPT with a structured prompt simulating specialized OSCE agents and the digital health reference book.
Methods: Overall, 24 OSCE stations were generated across 8 digital health topics with each GPT-4o configuration. Format compliance was evaluated by one expert, while educational content was assessed independently by 2 digital health experts, blind to GPT-4o configurations, using a comprehensive assessment grid. Statistical analyses were performed using Kruskal-Wallis tests.
Results: Simulated-agents GPT performed best in format compliance and most content quality criteria, including accuracy (mean 4.47/5, SD 0.28; P=.01) and clarity (mean 4.46/5, SD 0.52; P=.004). It also had 88% (14/16) for usability without major revisions and first-place preference ranking, outperforming the other configurations. Personalized GPT showed the lowest format compliance, while standard GPT scored lowest for clarity and educational value.
Conclusions: Structured prompting strategies, particularly agents' simulation, enhance the reliability and usability of LLM-generated OSCE content. These results support the use of artificial intelligence in medical education, while confirming the need for expert validation.
{"title":"AI-Driven Objective Structured Clinical Examination Generation in Digital Health Education: Comparative Analysis of Three GPT-4o Configurations.","authors":"Zineb Zouakia, Emmanuel Logak, Alan Szymczak, Jean-Philippe Jais, Anita Burgun, Rosy Tsopra","doi":"10.2196/82116","DOIUrl":"https://doi.org/10.2196/82116","url":null,"abstract":"<p><strong>Background: </strong>Objective Structured Clinical Examinations (OSCEs) are used as an evaluation method in medical education, but require significant pedagogical expertise and investment, especially in emerging fields like digital health. Large language models (LLMs), such as ChatGPT (OpenAI), have shown potential in automating educational content generation. However, OSCE generation using LLMs remains underexplored.</p><p><strong>Objective: </strong>This study aims to evaluate 3 GPT-4o configurations for generating OSCE stations in digital health: (1) standard GPT with a simple prompt and OSCE guidelines; (2) personalized GPT with a simple prompt, OSCE guidelines, and a reference book in digital health; and (3) simulated-agents GPT with a structured prompt simulating specialized OSCE agents and the digital health reference book.</p><p><strong>Methods: </strong>Overall, 24 OSCE stations were generated across 8 digital health topics with each GPT-4o configuration. Format compliance was evaluated by one expert, while educational content was assessed independently by 2 digital health experts, blind to GPT-4o configurations, using a comprehensive assessment grid. Statistical analyses were performed using Kruskal-Wallis tests.</p><p><strong>Results: </strong>Simulated-agents GPT performed best in format compliance and most content quality criteria, including accuracy (mean 4.47/5, SD 0.28; P=.01) and clarity (mean 4.46/5, SD 0.52; P=.004). It also had 88% (14/16) for usability without major revisions and first-place preference ranking, outperforming the other configurations. Personalized GPT showed the lowest format compliance, while standard GPT scored lowest for clarity and educational value.</p><p><strong>Conclusions: </strong>Structured prompting strategies, particularly agents' simulation, enhance the reliability and usability of LLM-generated OSCE content. These results support the use of artificial intelligence in medical education, while confirming the need for expert validation.</p>","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"12 ","pages":"e82116"},"PeriodicalIF":3.2,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145991377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
<p><strong>Background: </strong>Foundational knowledge of anesthesia techniques is essential for medical students. Team-based learning (TBL) improves engagement. Web-based virtual environments (WBVEs) allow many learners to join the same session in real time while being guided by an instructor.</p><p><strong>Objective: </strong>This study aimed to compare a WBVE with face-to-face (F2F) delivery of the same TBL curriculum in terms of postclass knowledge and learner satisfaction.</p><p><strong>Methods: </strong>We conducted a randomized, controlled, assessor-blinded trial at a Thai medical school from August 2024 to January 2025. Eligible participants were fifth-year medical students from the Faculty of Medicine, Khon Kaen University, who attended the anesthesiology course at the department of anesthesiology. Students who had previously completed the anesthesiology course or were unable to comply with the study protocol were excluded. They were allocated to one of the groups using a computer-generated sequence, with concealment of allocation to WBVE (on the Spatial platform) or F2F sessions. Both groups received identical 10-section content in a standardized TBL sequence lasting 130 minutes. Only the delivery mode differed (Spatial WBVE vs classroom F2F). The primary outcome was the postclass multiple-choice questionnaire score. The secondary outcome was learner satisfaction. Individual knowledge was assessed before and after the session using a 15-item questionnaire containing multiple-choice questions via Google Forms. Satisfaction was measured immediately after class on a 5-point Likert scale. Outcome scoring and data analysis were blinded to group assignment. Participants and instructors were not blinded.</p><p><strong>Results: </strong>In total, 79 students were randomized in this study (F2F: n=38, 48%; WBVE: n=41, 52%). We excluded 2% (1/41) of the students in the WBVE group due to incomplete data. There were complete data for the analysis for 78 participants (F2F: n=38, 49%; WBVE: n=40, 51%). Preclass scores were similar between groups (F2F: mean 6.03, SD 2.05; WBVE: mean 6.20, SD 2.04). Postclass knowledge did not differ significantly (F2F: mean 11.24, SD 1.93; WBVE: mean 10.40, SD 2.62; mean difference 0.88, 95% CI -0.18 to 1.94; P=.12). Learner satisfaction favored F2F learning across multiple domains, including overall course satisfaction. Overall satisfaction favored F2F learning (mean difference 0.42, 95% CI 0.07-0.77; P=.01). Both groups ran as planned. No adverse events were reported. No technical failures occurred in the WBVE group.</p><p><strong>Conclusions: </strong>In this trial, WBVE-delivered TBL produced similar short-term knowledge gains to F2F delivery, but learner satisfaction was lower in the WBVE group. Unlike many previous studies, this trial compared WBVE and F2F delivery while keeping the TBL curriculum and prespecified outcomes identical across groups. These findings support WBVEs as a scalable option when physical sp
背景:麻醉技术的基础知识是医学生必不可少的。基于团队的学习(TBL)提高了参与度。基于web的虚拟环境(WBVEs)允许许多学习者在讲师的指导下实时加入同一个课程。目的:本研究旨在比较面对面授课与面对面授课在课后知识和学习者满意度方面的差异。方法:我们于2024年8月至2025年1月在泰国一所医学院进行了一项随机、对照、评估者盲法试验。符合条件的参与者是孔敬大学医学院的五年级医学生,他们参加了麻醉科的麻醉学课程。先前已完成麻醉学课程或无法遵守研究方案的学生被排除在外。他们使用计算机生成的序列被分配到其中一个组,隐藏分配到WBVE(在空间平台上)或F2F会议。两组在标准化TBL序列中接受相同的10段内容,持续130分钟。只有交付模式不同(空间WBVE vs教室F2F)。主要结果为课后多项选择问卷得分。次要结果是学习者满意度。在课程前后,通过谷歌表格使用包含多项选择题的15项问卷来评估个人知识。满意度在下课后立即以5分李克特量表进行测量。结果评分和数据分析采用分组盲法。参与者和指导员没有被蒙蔽。结果:本研究共随机纳入79名学生(F2F: n=38, 48%; WBVE: n=41, 52%)。由于数据不完整,我们将2%(1/41)的WBVE组学生排除在外。78名受试者(F2F: n= 38,49%; WBVE: n= 40,51%)有完整的分析数据。两组间课前评分相近(F2F: mean 6.03, SD 2.05; WBVE: mean 6.20, SD 2.04)。课后知识差异无统计学意义(F2F: mean 11.24, SD 1.93; WBVE: mean 10.40, SD 2.62;平均差异0.88,95% CI -0.18 ~ 1.94; P= 0.12)。学习者满意度有利于跨多个领域的F2F学习,包括总体课程满意度。总体满意度倾向于F2F学习(平均差异0.42,95% CI 0.07-0.77; P= 0.01)。两组都按照计划进行。无不良事件报告。WBVE组未发生技术故障。结论:在本试验中,WBVE提供的TBL与F2F提供的TBL产生了相似的短期知识收益,但WBVE组的学习者满意度较低。与之前的许多研究不同,该试验比较了WBVE和F2F的交付,同时保持了TBL课程和预先指定的结果在各组之间相同。这些发现表明,当存在物理空间、学习者数量或限制时,wbve是一种可扩展的选择。然而,在WBVE中较低的满意度突出表明,在广泛实施之前,现实世界需要改进便利、用户体验设计和技术准备。试验注册:泰国临床试验注册中心TCTR20240708012;https://www.thaiclinicaltrials.org/show/TCTR20240708012。
{"title":"Web-Based Virtual Environment Versus Face-To-Face Delivery for Team-Based Learning of Anesthesia Techniques Among Undergraduate Medical Students: Randomized Controlled Trial.","authors":"Darunee Sripadungkul, Suhattaya Boonmak, Monsicha Somjit, Narin Plailaharn, Wimonrat Sriraj, Polpun Boonmak","doi":"10.2196/80097","DOIUrl":"https://doi.org/10.2196/80097","url":null,"abstract":"<p><strong>Background: </strong>Foundational knowledge of anesthesia techniques is essential for medical students. Team-based learning (TBL) improves engagement. Web-based virtual environments (WBVEs) allow many learners to join the same session in real time while being guided by an instructor.</p><p><strong>Objective: </strong>This study aimed to compare a WBVE with face-to-face (F2F) delivery of the same TBL curriculum in terms of postclass knowledge and learner satisfaction.</p><p><strong>Methods: </strong>We conducted a randomized, controlled, assessor-blinded trial at a Thai medical school from August 2024 to January 2025. Eligible participants were fifth-year medical students from the Faculty of Medicine, Khon Kaen University, who attended the anesthesiology course at the department of anesthesiology. Students who had previously completed the anesthesiology course or were unable to comply with the study protocol were excluded. They were allocated to one of the groups using a computer-generated sequence, with concealment of allocation to WBVE (on the Spatial platform) or F2F sessions. Both groups received identical 10-section content in a standardized TBL sequence lasting 130 minutes. Only the delivery mode differed (Spatial WBVE vs classroom F2F). The primary outcome was the postclass multiple-choice questionnaire score. The secondary outcome was learner satisfaction. Individual knowledge was assessed before and after the session using a 15-item questionnaire containing multiple-choice questions via Google Forms. Satisfaction was measured immediately after class on a 5-point Likert scale. Outcome scoring and data analysis were blinded to group assignment. Participants and instructors were not blinded.</p><p><strong>Results: </strong>In total, 79 students were randomized in this study (F2F: n=38, 48%; WBVE: n=41, 52%). We excluded 2% (1/41) of the students in the WBVE group due to incomplete data. There were complete data for the analysis for 78 participants (F2F: n=38, 49%; WBVE: n=40, 51%). Preclass scores were similar between groups (F2F: mean 6.03, SD 2.05; WBVE: mean 6.20, SD 2.04). Postclass knowledge did not differ significantly (F2F: mean 11.24, SD 1.93; WBVE: mean 10.40, SD 2.62; mean difference 0.88, 95% CI -0.18 to 1.94; P=.12). Learner satisfaction favored F2F learning across multiple domains, including overall course satisfaction. Overall satisfaction favored F2F learning (mean difference 0.42, 95% CI 0.07-0.77; P=.01). Both groups ran as planned. No adverse events were reported. No technical failures occurred in the WBVE group.</p><p><strong>Conclusions: </strong>In this trial, WBVE-delivered TBL produced similar short-term knowledge gains to F2F delivery, but learner satisfaction was lower in the WBVE group. Unlike many previous studies, this trial compared WBVE and F2F delivery while keeping the TBL curriculum and prespecified outcomes identical across groups. These findings support WBVEs as a scalable option when physical sp","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"12 ","pages":"e80097"},"PeriodicalIF":3.2,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145991397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
<p><strong>Background: </strong>In the current era of artificial intelligence (AI), use of AI has increased in both clinical practice and medical education. Nevertheless, it is probable that perspectives on the prospects and risks of AI vary among individuals. Given the potential for attitudes toward AI to significantly influence its integration into medical practice and educational initiatives, it is essential to assess these attitudes using a validated tool. The recently developed 12-item Attitudes Towards Artificial Intelligence scale has demonstrated good validity and reliability for the general population, suggesting its potential for extensive use in future studies. However, to our knowledge, there is currently no validated Japanese version of the scale. The lack of a Japanese version hinders research and educational efforts aimed at understanding and improving AI integration into the Japanese health care and medical education system.</p><p><strong>Objective: </strong>We aimed to develop the Japanese version of the 12-item Attitudes Towards Artificial Intelligence scale (J-ATTARI-12) and investigate whether it is applicable to medical trainees.</p><p><strong>Methods: </strong>We first translated the original English-language scale into Japanese. To examine its psychometric properties, we then conducted a validation survey by distributing the translated version as an online questionnaire to medical students and residents across Japan from June 2025 to July 2025. We assessed structural validity through factor analysis and convergent validity by computing the Pearson correlation coefficient between the J-ATTARI-12 scores and scores on attitude toward robots. Internal consistency reliability was assessed using Cronbach α values.</p><p><strong>Results: </strong>We included 326 participants in our analysis. We used a split-half validation approach, with exploratory factor analysis (EFA) on the first half and confirmatory factor analysis on the second half. EFA suggested a 2-factor solution (factor 1: AI anxiety and aversion; factor 2: AI optimism and acceptance). Confirmatory factor analysis revealed that the model fitness indexes of the 2-factor structure suggested by the EFA were good (comparative fit index=0.914 [>0.900]; root mean square error of approximation=0.075 [<0.080]; standardized root mean square residual=0.056 [<0.080]) and superior to those of the 1-factor structure. The value of the Pearson correlation coefficient between the J-ATTARI-12 scores and the attitude toward robots scores was 0.52, which indicated good convergent validity. The Cronbach α for all 12 items was 0.84, which indicated a high level of internal consistency reliability.</p><p><strong>Conclusions: </strong>We developed and validated the J-ATTARI-12. The developed instrument had good structural validity, convergent validity, and internal consistency reliability for medical trainees. The J-ATTARI-12 is expected to stimulate future studies and educational initiative
{"title":"Adaptation of the Japanese Version of the 12-Item Attitudes Towards Artificial Intelligence Scale for Medical Trainees: Multicenter Development and Validation Study.","authors":"Hirohisa Fujikawa, Hirotake Mori, Kayo Kondo, Yuji Nishizaki, Yuichiro Yano, Toshio Naito","doi":"10.2196/81986","DOIUrl":"https://doi.org/10.2196/81986","url":null,"abstract":"<p><strong>Background: </strong>In the current era of artificial intelligence (AI), use of AI has increased in both clinical practice and medical education. Nevertheless, it is probable that perspectives on the prospects and risks of AI vary among individuals. Given the potential for attitudes toward AI to significantly influence its integration into medical practice and educational initiatives, it is essential to assess these attitudes using a validated tool. The recently developed 12-item Attitudes Towards Artificial Intelligence scale has demonstrated good validity and reliability for the general population, suggesting its potential for extensive use in future studies. However, to our knowledge, there is currently no validated Japanese version of the scale. The lack of a Japanese version hinders research and educational efforts aimed at understanding and improving AI integration into the Japanese health care and medical education system.</p><p><strong>Objective: </strong>We aimed to develop the Japanese version of the 12-item Attitudes Towards Artificial Intelligence scale (J-ATTARI-12) and investigate whether it is applicable to medical trainees.</p><p><strong>Methods: </strong>We first translated the original English-language scale into Japanese. To examine its psychometric properties, we then conducted a validation survey by distributing the translated version as an online questionnaire to medical students and residents across Japan from June 2025 to July 2025. We assessed structural validity through factor analysis and convergent validity by computing the Pearson correlation coefficient between the J-ATTARI-12 scores and scores on attitude toward robots. Internal consistency reliability was assessed using Cronbach α values.</p><p><strong>Results: </strong>We included 326 participants in our analysis. We used a split-half validation approach, with exploratory factor analysis (EFA) on the first half and confirmatory factor analysis on the second half. EFA suggested a 2-factor solution (factor 1: AI anxiety and aversion; factor 2: AI optimism and acceptance). Confirmatory factor analysis revealed that the model fitness indexes of the 2-factor structure suggested by the EFA were good (comparative fit index=0.914 [>0.900]; root mean square error of approximation=0.075 [<0.080]; standardized root mean square residual=0.056 [<0.080]) and superior to those of the 1-factor structure. The value of the Pearson correlation coefficient between the J-ATTARI-12 scores and the attitude toward robots scores was 0.52, which indicated good convergent validity. The Cronbach α for all 12 items was 0.84, which indicated a high level of internal consistency reliability.</p><p><strong>Conclusions: </strong>We developed and validated the J-ATTARI-12. The developed instrument had good structural validity, convergent validity, and internal consistency reliability for medical trainees. The J-ATTARI-12 is expected to stimulate future studies and educational initiative","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"12 ","pages":"e81986"},"PeriodicalIF":3.2,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145991274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
<p><strong>Background: </strong>The rapid advancement of artificial intelligence (AI) has had a substantial impact on medicine, necessitating the integration of AI education into medical school curricula. However, such integration remains limited. A key challenge is the discrepancy between medical students' positive perceptions of AI and their actual competencies, with research in Japan identifying specific gaps in the students' competencies in understanding regulations and discussing ethical issues.</p><p><strong>Objective: </strong>This study evaluates the effectiveness of an educational program designed to improve medical students' competencies in understanding legal and ethical AI-related issues. It addresses the following research questions: (1) Does this educational program improve students' knowledge of AI and its legal and ethical issues, and what is each program element's contribution to this knowledge? (2) How does this educational program qualitatively change medical students' thoughts on these issues from an abstract understanding to a concrete and structured thought process?</p><p><strong>Methods: </strong>This mixed methods study used a single-group pretest and posttest framework involving 118 fourth-year medical students. The 1-day intervention comprised a lecture and problem-based learning (PBL) session centered on a clinical case. A 24-item multiple-choice questionnaire (MCQ) was administered at 3 time points (pretest, midtest, and posttest), and descriptive essays were collected before and after the intervention. Data were analyzed using linear mixed-effects models, the Wilcoxon signed-rank test, and text mining, including comparative frequency analysis and cooccurrence network analysis with Jaccard coefficients. An optional survey on student perceptions based on the attention, relevance, confidence, and satisfaction model was conducted (n=76, 64.4%).</p><p><strong>Results: </strong>Objective knowledge scores increased significantly from the pretest (median 17, IQR 15-18) to posttest (median 19, IQR 17-21; β=1.42; P<.001). No significant difference was observed between score gains during the lecture and PBL phases (P=.54). Qualitative text analysis revealed the significant transformation of cooccurrence network structures (Jaccard coefficients 0.116 and 0.121) from fragmented clusters to integrated networks. Students also used professional and ethical terminology more frequently. For instance, use of the term "bias" in patient explanations increased from 10 (8.5%) at pretest to 25 (21.2%) at posttest, while references to "personal information" in physician precautions increased from 36 (30.5%) to 50 (42.4%). The optional survey indicated that students' confidence (mean 3.78, SD 0.87) was significantly lower than their perception of the program's relevance (mean 4.20, SD 0.71; P<.001).</p><p><strong>Conclusions: </strong>This PBL-based program was associated with the improvements in knowledge and, more importantly, a structural t
{"title":"Evaluation of a Problem-Based Learning Program's Effect on Artificial Intelligence Ethics Among Japanese Medical Students: Mixed Methods Study.","authors":"Yuma Ota, Yoshikazu Asada, Saori Kubo, Takeshi Kanno, Machiko Saeki Yagi, Yasushi Matsuyama","doi":"10.2196/84535","DOIUrl":"https://doi.org/10.2196/84535","url":null,"abstract":"<p><strong>Background: </strong>The rapid advancement of artificial intelligence (AI) has had a substantial impact on medicine, necessitating the integration of AI education into medical school curricula. However, such integration remains limited. A key challenge is the discrepancy between medical students' positive perceptions of AI and their actual competencies, with research in Japan identifying specific gaps in the students' competencies in understanding regulations and discussing ethical issues.</p><p><strong>Objective: </strong>This study evaluates the effectiveness of an educational program designed to improve medical students' competencies in understanding legal and ethical AI-related issues. It addresses the following research questions: (1) Does this educational program improve students' knowledge of AI and its legal and ethical issues, and what is each program element's contribution to this knowledge? (2) How does this educational program qualitatively change medical students' thoughts on these issues from an abstract understanding to a concrete and structured thought process?</p><p><strong>Methods: </strong>This mixed methods study used a single-group pretest and posttest framework involving 118 fourth-year medical students. The 1-day intervention comprised a lecture and problem-based learning (PBL) session centered on a clinical case. A 24-item multiple-choice questionnaire (MCQ) was administered at 3 time points (pretest, midtest, and posttest), and descriptive essays were collected before and after the intervention. Data were analyzed using linear mixed-effects models, the Wilcoxon signed-rank test, and text mining, including comparative frequency analysis and cooccurrence network analysis with Jaccard coefficients. An optional survey on student perceptions based on the attention, relevance, confidence, and satisfaction model was conducted (n=76, 64.4%).</p><p><strong>Results: </strong>Objective knowledge scores increased significantly from the pretest (median 17, IQR 15-18) to posttest (median 19, IQR 17-21; β=1.42; P<.001). No significant difference was observed between score gains during the lecture and PBL phases (P=.54). Qualitative text analysis revealed the significant transformation of cooccurrence network structures (Jaccard coefficients 0.116 and 0.121) from fragmented clusters to integrated networks. Students also used professional and ethical terminology more frequently. For instance, use of the term \"bias\" in patient explanations increased from 10 (8.5%) at pretest to 25 (21.2%) at posttest, while references to \"personal information\" in physician precautions increased from 36 (30.5%) to 50 (42.4%). The optional survey indicated that students' confidence (mean 3.78, SD 0.87) was significantly lower than their perception of the program's relevance (mean 4.20, SD 0.71; P<.001).</p><p><strong>Conclusions: </strong>This PBL-based program was associated with the improvements in knowledge and, more importantly, a structural t","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"12 ","pages":"e84535"},"PeriodicalIF":3.2,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145985167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Caroline Sumner, Sami L Case, Samuel Franklin, Kristen Platt
Background: As medical and allied health curricula adapt to increasing time constraints, ethical considerations, and resource limitations, digital innovations are becoming vital supplements to donor-based anatomy instruction. While prior studies have examined the effectiveness of prosection versus dissection and the role of digital tools in anatomy learning, few resources align interactive digital modules directly with hands-on prosection experiences.
Objective: This project addresses that gap by introducing an integrated, curriculum-aligned platform for self-guided cadaveric learning.
Methods: We created Anatomy Interactives, a web-based laboratory manual structured to complement prosection laboratories for MD, DPT, and PA students. Modules were developed using iSpring Suite (iSpring Solutions Incorporated) and included interactive labeled images, donor photographs, and quiz-style self-assessments. Learners engaged with modules before, during, or after laboratory sessions. PA/DPT and MD students completed postcourse surveys evaluating module use and perceived impact. MD student examination scores from a 2023 cohort (no module access) were compared to a 2024 cohort (with access) to evaluate effectiveness.
Results: A total of 147 students completed the survey (31 PA/DPT and 116 MD). The majority reported using modules for 1-2 hours per week and found them helpful for both written and laboratory examinations. MD students in the 2024 cohort performed better on all 3 examinations compared to the 2023 cohort, with 2 examination median differences reaching statistical significance (Mann-Whitney U, P<.001). Qualitative feedback highlighted accessibility, content reinforcement, and user engagement as key benefits.
Conclusions: Interactive modules integrated with prosection laboratories enhanced learner engagement and performance. This hybrid digital-donor model shows promise for scalable, learner-centered gross anatomy education.
背景:随着医学和相关卫生课程适应越来越多的时间限制、伦理考虑和资源限制,数字创新正成为以供体为基础的解剖学教学的重要补充。虽然先前的研究已经检查了检控与解剖的有效性以及数字工具在解剖学学习中的作用,但很少有资源将交互式数字模块直接与动手检控经验结合起来。目的:该项目通过引入一个集成的、与课程相一致的自我引导尸体学习平台来解决这一差距。方法:我们创建了Anatomy interactive,这是一个基于网络的实验室手册,用于补充MD, DPT和PA学生的检检实验室。模块使用isspring Suite (isspring Solutions Incorporated)开发,包括交互式标记图像、供体照片和测验式自我评估。学习者在实验之前、期间或之后都参与了模块的学习。PA/DPT和MD学生完成了评估模块使用和感知影响的课后调查。将2023年队列(无模块访问)的MD学生考试分数与2024年队列(有模块访问)的MD学生考试分数进行比较,以评估有效性。结果:共147名学生完成调查,其中PA/DPT 31名,MD 116名。大多数人报告每周使用1-2小时的模块,并发现它们对笔试和实验室考试都很有帮助。与2023队列相比,2024队列的MD学生在所有3项考试中表现更好,其中2项考试中位数差异达到统计学意义(Mann-Whitney U, p)。结论:与检检实验室集成的互动模块提高了学习者的参与度和表现。这种混合数字捐赠模式显示了可扩展的、以学习者为中心的大体解剖学教育的前景。
{"title":"Interactive, Image-Based Modules as a Complement to Prosection-Based Anatomy Laboratories: Multicohort Evaluation.","authors":"Caroline Sumner, Sami L Case, Samuel Franklin, Kristen Platt","doi":"10.2196/85028","DOIUrl":"https://doi.org/10.2196/85028","url":null,"abstract":"<p><strong>Background: </strong>As medical and allied health curricula adapt to increasing time constraints, ethical considerations, and resource limitations, digital innovations are becoming vital supplements to donor-based anatomy instruction. While prior studies have examined the effectiveness of prosection versus dissection and the role of digital tools in anatomy learning, few resources align interactive digital modules directly with hands-on prosection experiences.</p><p><strong>Objective: </strong>This project addresses that gap by introducing an integrated, curriculum-aligned platform for self-guided cadaveric learning.</p><p><strong>Methods: </strong>We created Anatomy Interactives, a web-based laboratory manual structured to complement prosection laboratories for MD, DPT, and PA students. Modules were developed using iSpring Suite (iSpring Solutions Incorporated) and included interactive labeled images, donor photographs, and quiz-style self-assessments. Learners engaged with modules before, during, or after laboratory sessions. PA/DPT and MD students completed postcourse surveys evaluating module use and perceived impact. MD student examination scores from a 2023 cohort (no module access) were compared to a 2024 cohort (with access) to evaluate effectiveness.</p><p><strong>Results: </strong>A total of 147 students completed the survey (31 PA/DPT and 116 MD). The majority reported using modules for 1-2 hours per week and found them helpful for both written and laboratory examinations. MD students in the 2024 cohort performed better on all 3 examinations compared to the 2023 cohort, with 2 examination median differences reaching statistical significance (Mann-Whitney U, P<.001). Qualitative feedback highlighted accessibility, content reinforcement, and user engagement as key benefits.</p><p><strong>Conclusions: </strong>Interactive modules integrated with prosection laboratories enhanced learner engagement and performance. This hybrid digital-donor model shows promise for scalable, learner-centered gross anatomy education.</p>","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"12 ","pages":"e85028"},"PeriodicalIF":3.2,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145967279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pau Benito, Mikel Isla-Jover, Pablo González-Castro, Pedro José Fernández Esparcia, Manuel Carpio, Iván Blay-Simón, Pablo Gutiérrez-Bedia, Maria J Lapastora, Beatriz Carratalá, Carlos Carazo-Casas
<p><strong>Background: </strong>In recent years, generative artificial intelligence and large language models (LLMs) have rapidly advanced, offering significant potential to transform medical education. Several studies have evaluated the performance of chatbots on multiple-choice medical examinations.</p><p><strong>Objective: </strong>The study aims to assess the performance of two LLMs-GPT-4o and OpenAI o1-on the Médico Interno Residente (MIR) 2024 examination, the Spanish national medical test that determines eligibility for competitive medical specialist training positions.</p><p><strong>Methods: </strong>A total of 176 questions from the MIR 2024 examination were analyzed. Each question was presented individually to the chatbots to ensure independence and prevent memory retention bias. No additional prompts were introduced to minimize potential bias. For each LLM, response consistency under verification prompting was assessed by systematically asking, "Are you sure?" after each response. Accuracy was defined as the percentage of correct responses compared to the official answers provided by the Spanish Ministry of Health. It was assessed for GPT-4o, OpenAI o1, and, as a benchmark, for a consensus of medical specialists and for the average MIR candidate. Subanalyses included performance across different medical subjects, question difficulty (quintiles based on the percentage of examinees correctly answering each question), and question types (clinical cases vs theoretical questions; positive vs negative questions).</p><p><strong>Results: </strong>Overall accuracy was 89.8% (158/176) for GPT-4o and 90% (160/176) after verification prompting, 92.6% (163/176) for OpenAI o1 and 93.2% (164/176) after verification prompting, 94.3% (166/176) for the consensus of medical specialists, and 56.6% (100/176) for the average MIR candidate. Both LLMs and the consensus of medical specialists outperformed the average MIR candidate across all 20 medical subjects analyzed, with ≥80% LLMs' accuracy in most domains. A performance gradient was observed: LLMs' accuracy gradually declined as question difficulty increased. Slightly higher accuracy was observed for clinical cases compared to theoretical questions, as well as for positive questions compared to negative ones. Both models demonstrated high response consistency, with near-perfect agreement between initial responses and those after the verification prompting.</p><p><strong>Conclusions: </strong>These findings highlight the excellent performance of GPT-4o and OpenAI o1 on the MIR 2024 examination, demonstrating consistent accuracy across medical subjects and question types. The integration of LLMs into medical education presents promising opportunities and is likely to reshape how students prepare for licensing examinations and change our understanding of medical education. Further research should explore how the wording, language, prompting techniques, and image-based questions can influence LLMs' accuracy,
{"title":"GPT-4o and OpenAI o1 Performance on the 2024 Spanish Competitive Medical Specialty Access Examination: Cross-Sectional Quantitative Evaluation Study.","authors":"Pau Benito, Mikel Isla-Jover, Pablo González-Castro, Pedro José Fernández Esparcia, Manuel Carpio, Iván Blay-Simón, Pablo Gutiérrez-Bedia, Maria J Lapastora, Beatriz Carratalá, Carlos Carazo-Casas","doi":"10.2196/75452","DOIUrl":"10.2196/75452","url":null,"abstract":"<p><strong>Background: </strong>In recent years, generative artificial intelligence and large language models (LLMs) have rapidly advanced, offering significant potential to transform medical education. Several studies have evaluated the performance of chatbots on multiple-choice medical examinations.</p><p><strong>Objective: </strong>The study aims to assess the performance of two LLMs-GPT-4o and OpenAI o1-on the Médico Interno Residente (MIR) 2024 examination, the Spanish national medical test that determines eligibility for competitive medical specialist training positions.</p><p><strong>Methods: </strong>A total of 176 questions from the MIR 2024 examination were analyzed. Each question was presented individually to the chatbots to ensure independence and prevent memory retention bias. No additional prompts were introduced to minimize potential bias. For each LLM, response consistency under verification prompting was assessed by systematically asking, \"Are you sure?\" after each response. Accuracy was defined as the percentage of correct responses compared to the official answers provided by the Spanish Ministry of Health. It was assessed for GPT-4o, OpenAI o1, and, as a benchmark, for a consensus of medical specialists and for the average MIR candidate. Subanalyses included performance across different medical subjects, question difficulty (quintiles based on the percentage of examinees correctly answering each question), and question types (clinical cases vs theoretical questions; positive vs negative questions).</p><p><strong>Results: </strong>Overall accuracy was 89.8% (158/176) for GPT-4o and 90% (160/176) after verification prompting, 92.6% (163/176) for OpenAI o1 and 93.2% (164/176) after verification prompting, 94.3% (166/176) for the consensus of medical specialists, and 56.6% (100/176) for the average MIR candidate. Both LLMs and the consensus of medical specialists outperformed the average MIR candidate across all 20 medical subjects analyzed, with ≥80% LLMs' accuracy in most domains. A performance gradient was observed: LLMs' accuracy gradually declined as question difficulty increased. Slightly higher accuracy was observed for clinical cases compared to theoretical questions, as well as for positive questions compared to negative ones. Both models demonstrated high response consistency, with near-perfect agreement between initial responses and those after the verification prompting.</p><p><strong>Conclusions: </strong>These findings highlight the excellent performance of GPT-4o and OpenAI o1 on the MIR 2024 examination, demonstrating consistent accuracy across medical subjects and question types. The integration of LLMs into medical education presents promising opportunities and is likely to reshape how students prepare for licensing examinations and change our understanding of medical education. Further research should explore how the wording, language, prompting techniques, and image-based questions can influence LLMs' accuracy,","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"12 ","pages":"e75452"},"PeriodicalIF":3.2,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12795474/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145960564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jakob E Gamboa, Inge Tamm-Daniels, Roland Flores, Nancy G Sarat Diaz, Mario A Villasenor, Mitchell A Gist, Aidan B Hoie, Christopher Kurinec, Colby G Simmons
Background: Ultrasound-guided regional anesthesia (UGRA) remains underused in low- and middle-income countries due to barriers to training and equipment. Recent advances in portable ultrasound devices and international partnerships have expanded access to UGRA, enhancing patient safety and quality of care.
Objective: This study describes the development and outcomes of a hybrid UGRA training program for anesthesiologists at the Hospital Nacional de Coatepeque (HNC) in Guatemala.
Methods: An educational pilot program for UGRA was developed based on local needs and feedback, comprising 4 weeks of online modules, an in-person educational conference, and 1 month of supervised clinical practice. Evaluation followed the Kirkpatrick framework using preprogram and postprogram surveys adapted from the Global Regional Anesthesia Curricular Engagement model. Outcomes included participants' satisfaction, change in knowledge and skill, and procedural performance. Knowledge and skill assessments were compared before and after the training, and clinical data were recorded for 10 months. Nonparametric tests were used to assess changes and associations with performance outcomes.
Results: All 7 anesthesiologists at HNC completed the training program. Knowledge test scores improved by a median percentage increase of 20.8% (IQR 13.5%-28.1%; r=0.899; P=.02), and procedural skill rating scores increased by a median percentage of 147.1% (IQR 96.9%-197.3%; r=0.904; P=.03) at 1 month and 131.4% (IQR 90.5%-172.3%; r=0.909; P=.04) at 4 months after the program. Participants self-reported high satisfaction and substantial clinical improvement and motivation. A total of 54 peripheral nerve blocks were performed under direct supervision in the first month, with 187 blocks recorded over 10 months. The supraclavicular brachial plexus block was the most frequently used (66/187, 35.3%) and replaced the standard general anesthetic for upper extremity surgery in 70 patients. The procedure success rate was 96.3% (180/187), and there were no observed patient complications.
Conclusions: This hybrid curriculum enabled the successful implementation of UGRA at a public hospital in Guatemala, safely expanding clinical capabilities and reducing reliance on general anesthesia for upper extremity surgery. This practical training model provides a framework for implementing UGRA in similar resource-limited hospitals.
{"title":"Ultrasound-Guided Regional Anesthesia in a Resource-Limited Hospital: Prospective Pilot Study of a Hybrid Training Program.","authors":"Jakob E Gamboa, Inge Tamm-Daniels, Roland Flores, Nancy G Sarat Diaz, Mario A Villasenor, Mitchell A Gist, Aidan B Hoie, Christopher Kurinec, Colby G Simmons","doi":"10.2196/84181","DOIUrl":"10.2196/84181","url":null,"abstract":"<p><strong>Background: </strong>Ultrasound-guided regional anesthesia (UGRA) remains underused in low- and middle-income countries due to barriers to training and equipment. Recent advances in portable ultrasound devices and international partnerships have expanded access to UGRA, enhancing patient safety and quality of care.</p><p><strong>Objective: </strong>This study describes the development and outcomes of a hybrid UGRA training program for anesthesiologists at the Hospital Nacional de Coatepeque (HNC) in Guatemala.</p><p><strong>Methods: </strong>An educational pilot program for UGRA was developed based on local needs and feedback, comprising 4 weeks of online modules, an in-person educational conference, and 1 month of supervised clinical practice. Evaluation followed the Kirkpatrick framework using preprogram and postprogram surveys adapted from the Global Regional Anesthesia Curricular Engagement model. Outcomes included participants' satisfaction, change in knowledge and skill, and procedural performance. Knowledge and skill assessments were compared before and after the training, and clinical data were recorded for 10 months. Nonparametric tests were used to assess changes and associations with performance outcomes.</p><p><strong>Results: </strong>All 7 anesthesiologists at HNC completed the training program. Knowledge test scores improved by a median percentage increase of 20.8% (IQR 13.5%-28.1%; r=0.899; P=.02), and procedural skill rating scores increased by a median percentage of 147.1% (IQR 96.9%-197.3%; r=0.904; P=.03) at 1 month and 131.4% (IQR 90.5%-172.3%; r=0.909; P=.04) at 4 months after the program. Participants self-reported high satisfaction and substantial clinical improvement and motivation. A total of 54 peripheral nerve blocks were performed under direct supervision in the first month, with 187 blocks recorded over 10 months. The supraclavicular brachial plexus block was the most frequently used (66/187, 35.3%) and replaced the standard general anesthetic for upper extremity surgery in 70 patients. The procedure success rate was 96.3% (180/187), and there were no observed patient complications.</p><p><strong>Conclusions: </strong>This hybrid curriculum enabled the successful implementation of UGRA at a public hospital in Guatemala, safely expanding clinical capabilities and reducing reliance on general anesthesia for upper extremity surgery. This practical training model provides a framework for implementing UGRA in similar resource-limited hospitals.</p>","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":" ","pages":"e84181"},"PeriodicalIF":3.2,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145811539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}