Rachel Skains, Julie Brown, Erin F Shufflebarger, Justine McGiboney, Sherell Hicks, Laine McDonald, Katherine B Griesmer, Christine Shaw, Emily Grass, Marie-Carmelle Elie, Lauren A Walter
Unlabelled: Faculty development is a cornerstone of academic medicine, supporting personal growth, professional advancement, and departmental effectiveness across all stages of a faculty member's career. Among the tools available, faculty retreats have increasingly emerged as a high-impact strategy to foster collaboration, advance strategic planning, and address individual and collective goals in a structured, reflective setting. While retreats are widely used in other sectors, practical guidance tailored to the academic medicine context remains limited. This tutorial offers a comprehensive, step-by-step framework for planning and implementing faculty retreats within academic departments. Key elements of effective retreat design are outlined, including (1) conducting a preretreat needs assessment to align goals with faculty priorities, (2) selecting an appropriate format (eg, in-person or hybrid), (3) fostering psychological safety to enhance participation, and (4) using facilitation techniques that promote inclusive dialogue and actionable outcomes. The tutorial also emphasizes logistical considerations, such as agenda design, timing, and participant engagement strategies, alongside mechanisms to ensure follow-up and accountability after the retreat. In addition to highlighting common barriers, such as resource limitations, scheduling constraints, and engagement disparities, the tutorial provides practical solutions drawn from real-world examples in academic medicine. By integrating thoughtful planning, evidence-informed facilitation, and postretreat follow-through, faculty retreats can serve as transformative experiences that support both individual development and departmental cohesion. This resource aims to fill a gap in the literature by equipping leaders in academic medicine with a structured approach to designing, executing, and sustaining the benefits of faculty retreats.
{"title":"Faculty Retreats in Academic Medicine: Tutorial.","authors":"Rachel Skains, Julie Brown, Erin F Shufflebarger, Justine McGiboney, Sherell Hicks, Laine McDonald, Katherine B Griesmer, Christine Shaw, Emily Grass, Marie-Carmelle Elie, Lauren A Walter","doi":"10.2196/71622","DOIUrl":"10.2196/71622","url":null,"abstract":"<p><strong>Unlabelled: </strong>Faculty development is a cornerstone of academic medicine, supporting personal growth, professional advancement, and departmental effectiveness across all stages of a faculty member's career. Among the tools available, faculty retreats have increasingly emerged as a high-impact strategy to foster collaboration, advance strategic planning, and address individual and collective goals in a structured, reflective setting. While retreats are widely used in other sectors, practical guidance tailored to the academic medicine context remains limited. This tutorial offers a comprehensive, step-by-step framework for planning and implementing faculty retreats within academic departments. Key elements of effective retreat design are outlined, including (1) conducting a preretreat needs assessment to align goals with faculty priorities, (2) selecting an appropriate format (eg, in-person or hybrid), (3) fostering psychological safety to enhance participation, and (4) using facilitation techniques that promote inclusive dialogue and actionable outcomes. The tutorial also emphasizes logistical considerations, such as agenda design, timing, and participant engagement strategies, alongside mechanisms to ensure follow-up and accountability after the retreat. In addition to highlighting common barriers, such as resource limitations, scheduling constraints, and engagement disparities, the tutorial provides practical solutions drawn from real-world examples in academic medicine. By integrating thoughtful planning, evidence-informed facilitation, and postretreat follow-through, faculty retreats can serve as transformative experiences that support both individual development and departmental cohesion. This resource aims to fill a gap in the literature by equipping leaders in academic medicine with a structured approach to designing, executing, and sustaining the benefits of faculty retreats.</p>","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"11 ","pages":"e71622"},"PeriodicalIF":3.2,"publicationDate":"2025-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12582542/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145439416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Laurent Elkrief, Alexandre Hudon, Giovanni Briganti, Paul Lespérance
The increasing use of generative large language models (LLMs) necessitates a fundamental reevaluation of traditional didactic lectures in medical education, particularly within psychiatry. The specialty's inherent diagnostic ambiguity, biopsychosocial complexity, and reliance on nuanced interpersonal skills demand an educational model that transcends mere information transfer, focusing instead on cultivating sophisticated clinical reasoning. This viewpoint argues for a shift from passive knowledge transmission to active, facilitated development of higher-order thinking, aligning with the Bloom taxonomy. We describe four core propositions: (1) shifting foundational knowledge acquisition to faculty-curated asynchronous artificial intelligence (AI)-assisted micromodules; (2) transforming synchronous time into "Ambiguity Seminars" for discussing nuanced cases, biopsychosocial formulation, and ethical dilemmas, leveraging faculty expertise in guiding reasoning; (3) integrating live LLM critical interaction drills to develop prompt engineering skills and critical appraisal of AI outputs; and (4) realigning assessment methods (eg, objective structured clinical examinations [OSCEs], reflective writing) to evaluate clinical reasoning and integrative skills rather than rote recall. Successful implementation requires comprehensive faculty development, explicit institutional investment, and a phased approach that addresses scalability across varying resource settings. This reimagined approach aims to cultivate clinical wisdom, equipping psychiatric trainees with adaptive reasoning frameworks essential for excellence in an AI-mediated future.
{"title":"Beyond Lectures: Reimagining Psychiatric Didactics for the Age of AI.","authors":"Laurent Elkrief, Alexandre Hudon, Giovanni Briganti, Paul Lespérance","doi":"10.2196/78110","DOIUrl":"10.2196/78110","url":null,"abstract":"<p><p>The increasing use of generative large language models (LLMs) necessitates a fundamental reevaluation of traditional didactic lectures in medical education, particularly within psychiatry. The specialty's inherent diagnostic ambiguity, biopsychosocial complexity, and reliance on nuanced interpersonal skills demand an educational model that transcends mere information transfer, focusing instead on cultivating sophisticated clinical reasoning. This viewpoint argues for a shift from passive knowledge transmission to active, facilitated development of higher-order thinking, aligning with the Bloom taxonomy. We describe four core propositions: (1) shifting foundational knowledge acquisition to faculty-curated asynchronous artificial intelligence (AI)-assisted micromodules; (2) transforming synchronous time into \"Ambiguity Seminars\" for discussing nuanced cases, biopsychosocial formulation, and ethical dilemmas, leveraging faculty expertise in guiding reasoning; (3) integrating live LLM critical interaction drills to develop prompt engineering skills and critical appraisal of AI outputs; and (4) realigning assessment methods (eg, objective structured clinical examinations [OSCEs], reflective writing) to evaluate clinical reasoning and integrative skills rather than rote recall. Successful implementation requires comprehensive faculty development, explicit institutional investment, and a phased approach that addresses scalability across varying resource settings. This reimagined approach aims to cultivate clinical wisdom, equipping psychiatric trainees with adaptive reasoning frameworks essential for excellence in an AI-mediated future.</p>","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"11 ","pages":"e78110"},"PeriodicalIF":3.2,"publicationDate":"2025-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12619012/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145423133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Natasha E Barton, Kenny Ta, Angela R Loczi-Storm, Cory A Dunnick, Robert P Dellavalle
Unlabelled: The Dellavalle/Dunnick Dermato-Epidemiology Lab transitioned from a single campus to a dual-campus collaboration between the University of Colorado and the University of Minnesota in 2024. Since the 2020 COVID-19 pandemic, the laboratory has been operating on Zoom and allows medical students from any institution to join. This innovative laboratory structure offers students and other researchers unique opportunities to engage in dermatological research and develop professional networks across two large academic institutions. The laboratory's model embraces a virtual collaborative approach, promotes inclusivity, encourages student-led inquiry, and provides a structured environment for professional development and academic output. Through its commitment to diverse student perspectives and interdisciplinary cooperation, the Dellavalle/Dunnick Dermato-Epidemiology Lab creates a new, equitable, nationwide model for research and mentorship in dermatology, supporting medical students, residents, and fellows to navigate future careers in dermatology.
{"title":"Advantages of a Virtual Collaborative Research Dermatology Laboratory.","authors":"Natasha E Barton, Kenny Ta, Angela R Loczi-Storm, Cory A Dunnick, Robert P Dellavalle","doi":"10.2196/65697","DOIUrl":"10.2196/65697","url":null,"abstract":"<p><strong>Unlabelled: </strong>The Dellavalle/Dunnick Dermato-Epidemiology Lab transitioned from a single campus to a dual-campus collaboration between the University of Colorado and the University of Minnesota in 2024. Since the 2020 COVID-19 pandemic, the laboratory has been operating on Zoom and allows medical students from any institution to join. This innovative laboratory structure offers students and other researchers unique opportunities to engage in dermatological research and develop professional networks across two large academic institutions. The laboratory's model embraces a virtual collaborative approach, promotes inclusivity, encourages student-led inquiry, and provides a structured environment for professional development and academic output. Through its commitment to diverse student perspectives and interdisciplinary cooperation, the Dellavalle/Dunnick Dermato-Epidemiology Lab creates a new, equitable, nationwide model for research and mentorship in dermatology, supporting medical students, residents, and fellows to navigate future careers in dermatology.</p>","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"11 ","pages":"e65697"},"PeriodicalIF":3.2,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12574937/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145410406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Loeb, Jamie Shoemaker, Kelly Ely, Matthew Zackoff
<p><strong>Background: </strong>Virtual reality (VR)-based simulation is an increasingly popular tool for simulation-based medical education, immersing participants in a realistic, 3D world where health care professionals can observe nuanced examination findings, such as subtle indicators of respiratory distress and skin perfusion. However, it remains unknown how the VR environment affects participant behavior and attention.</p><p><strong>Objective: </strong>This study aimed to describe clinician attention and decision-making behaviors during interprofessional pediatric resuscitation simulations performed in VR. We used video-based focused ethnography to describe how participant attention and behavior are altered in the VR environment and reflect how these changes may affect the educational profile of VR simulation.</p><p><strong>Methods: </strong>The research team analyzed scenarios with the question, "How does a completely virtual reality environment alter participant attention and behavior, and how might these changes impact educational goals?" Video-based focused ethnography consisting of data collection, analysis, and pattern explanation was conducted by experts in critical care, resuscitation, simulation, and medical education until data saturation was achieved.</p><p><strong>Results: </strong>Fifteen interprofessional VR simulation sessions featuring the same scenario-a child with pneumonia and sepsis-were evaluated. Three major themes emerged: Source of Truth, Cognitive Focus, and Fidelity Breakers. Source of Truth explores how participants gather and synthesize information in a VR environment. Participants used the patient's physical examination over ancillary data sources, such as the cardiorespiratory monitor, returning to the monitor when the examination did not align with expectations. Cognitive Focus describes the interplay between thinking, communicating, and doing during a VR simulation. The VR setting imposed unique cognitive demands, requiring participants to process information from multiple sources, make rapid decisions, and execute tasks during the scenario. Participants experienced increased task burden when virtual tasks did not mirror real-world procedures, leading to delays and fixation on certain actions. Fidelity Breakers reflects how technical and environmental factors disrupted focus and hindered learning. Navigational challenges, such as unintended teleportation and difficulties interacting with the virtual patient and equipment, disrupted participant immersion. These challenges underscore the current limitations of VR in reproducing the tactile and procedural aspects of real clinical care.</p><p><strong>Conclusions: </strong>Participants' focus on the physical examination findings in VR, as opposed to the cardiorespiratory monitor, potentially indicates simulation of an identical, more patient examination-centered approach to clinical data gathering. In addition, the multiple data sources allowed for participant cog
{"title":"Deconstructing Participant Behaviors in Virtual Reality Simulation: Ethnographic Analysis.","authors":"Daniel Loeb, Jamie Shoemaker, Kelly Ely, Matthew Zackoff","doi":"10.2196/65886","DOIUrl":"10.2196/65886","url":null,"abstract":"<p><strong>Background: </strong>Virtual reality (VR)-based simulation is an increasingly popular tool for simulation-based medical education, immersing participants in a realistic, 3D world where health care professionals can observe nuanced examination findings, such as subtle indicators of respiratory distress and skin perfusion. However, it remains unknown how the VR environment affects participant behavior and attention.</p><p><strong>Objective: </strong>This study aimed to describe clinician attention and decision-making behaviors during interprofessional pediatric resuscitation simulations performed in VR. We used video-based focused ethnography to describe how participant attention and behavior are altered in the VR environment and reflect how these changes may affect the educational profile of VR simulation.</p><p><strong>Methods: </strong>The research team analyzed scenarios with the question, \"How does a completely virtual reality environment alter participant attention and behavior, and how might these changes impact educational goals?\" Video-based focused ethnography consisting of data collection, analysis, and pattern explanation was conducted by experts in critical care, resuscitation, simulation, and medical education until data saturation was achieved.</p><p><strong>Results: </strong>Fifteen interprofessional VR simulation sessions featuring the same scenario-a child with pneumonia and sepsis-were evaluated. Three major themes emerged: Source of Truth, Cognitive Focus, and Fidelity Breakers. Source of Truth explores how participants gather and synthesize information in a VR environment. Participants used the patient's physical examination over ancillary data sources, such as the cardiorespiratory monitor, returning to the monitor when the examination did not align with expectations. Cognitive Focus describes the interplay between thinking, communicating, and doing during a VR simulation. The VR setting imposed unique cognitive demands, requiring participants to process information from multiple sources, make rapid decisions, and execute tasks during the scenario. Participants experienced increased task burden when virtual tasks did not mirror real-world procedures, leading to delays and fixation on certain actions. Fidelity Breakers reflects how technical and environmental factors disrupted focus and hindered learning. Navigational challenges, such as unintended teleportation and difficulties interacting with the virtual patient and equipment, disrupted participant immersion. These challenges underscore the current limitations of VR in reproducing the tactile and procedural aspects of real clinical care.</p><p><strong>Conclusions: </strong>Participants' focus on the physical examination findings in VR, as opposed to the cardiorespiratory monitor, potentially indicates simulation of an identical, more patient examination-centered approach to clinical data gathering. In addition, the multiple data sources allowed for participant cog","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"11 ","pages":"e65886"},"PeriodicalIF":3.2,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12571426/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145379136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chris Jacobs, Hans Johnson, Nina Tan, Kirsty Brownlie, Richard Joiner, Trevor Thompson
<p><strong>Background: </strong>Effective communication is fundamental to high-quality health care delivery, influencing patient satisfaction, adherence to treatment plans, and clinical outcomes. However, communication skills training for medical undergraduates often faces challenges in scalability, resource allocation, and personalization. Traditional methods, such as role-playing with standardized patients, are resource intensive and may not provide consistent feedback tailored to individual learners' needs. Artificial intelligence (AI) offers realistic patient interactions for education.</p><p><strong>Objective: </strong>This study aims to investigate the application of AI communication training tools in medical undergraduate education within a primary care context. The study evaluates the effectiveness, usability, and impact of AI virtual patients (VPs) on medical students' experience in communication skills practice.</p><p><strong>Methods: </strong>The study used a mixed methods sequential explanatory design, comprising a quantitative survey followed by qualitative focus group discussions. Eighteen participants, including 15 medical students and 3 practicing doctors, engaged with an AI VP simulating a primary care consultation for prostate cancer risk assessment. The AI VP was designed using a large language model and natural voice synthesis to create realistic patient interactions. The survey assessed 5 domains: fidelity, immersion, intrinsic motivation, debriefing, and system usability. Focus groups were used to explore participants' experiences, challenges, and perceived educational value of the AI tool.</p><p><strong>Results: </strong>Significant positive responses emerged against a neutral baseline, with the following median scores: intrinsic motivation 16.5 of 20.0 (IQR 15.0-18.0; d=2.09, P<.001), system usability 12.0 of 15.0 (IQR 11.5-12.5; d=2.18, P<.001), and psychological safety 5.0 of 5.0 (IQR 5.0-5.0; d=4.78, P<.001). Fidelity (median score 6.0/10.0, IQR 5.2-7.0; d=-0.08, P=.02) and immersion (median score 8.5/15.0, IQR 7.0-9.8; d=0.25 P=.08) were moderately rated. The overall Immersive Technology Evaluation Measure scores showed a high positive learning experience: median 47.5 of 65.0 (IQR 43.0-51.2; d=2.00, P<.001). Qualitative analysis identified 3 major themes across 11 subthemes, with participants highlighting both technical limitations and educational value. Participants valued the safe practice environment and the ability to receive immediate feedback.</p><p><strong>Conclusions: </strong>AI VP technology shows promising potential for communication skills training despite the current realism limitations. While it does not yet match human standardized patient authenticity, the technology has achieved sufficient fidelity to support meaningful educational interactions, and this study identified clear areas for improvement. The integration of AI into medical curricula represents a promising avenue for innovation in medical edu
{"title":"Application of AI Communication Training Tools in Medical Undergraduate Education: Mixed Methods Feasibility Study Within a Primary Care Context.","authors":"Chris Jacobs, Hans Johnson, Nina Tan, Kirsty Brownlie, Richard Joiner, Trevor Thompson","doi":"10.2196/70766","DOIUrl":"10.2196/70766","url":null,"abstract":"<p><strong>Background: </strong>Effective communication is fundamental to high-quality health care delivery, influencing patient satisfaction, adherence to treatment plans, and clinical outcomes. However, communication skills training for medical undergraduates often faces challenges in scalability, resource allocation, and personalization. Traditional methods, such as role-playing with standardized patients, are resource intensive and may not provide consistent feedback tailored to individual learners' needs. Artificial intelligence (AI) offers realistic patient interactions for education.</p><p><strong>Objective: </strong>This study aims to investigate the application of AI communication training tools in medical undergraduate education within a primary care context. The study evaluates the effectiveness, usability, and impact of AI virtual patients (VPs) on medical students' experience in communication skills practice.</p><p><strong>Methods: </strong>The study used a mixed methods sequential explanatory design, comprising a quantitative survey followed by qualitative focus group discussions. Eighteen participants, including 15 medical students and 3 practicing doctors, engaged with an AI VP simulating a primary care consultation for prostate cancer risk assessment. The AI VP was designed using a large language model and natural voice synthesis to create realistic patient interactions. The survey assessed 5 domains: fidelity, immersion, intrinsic motivation, debriefing, and system usability. Focus groups were used to explore participants' experiences, challenges, and perceived educational value of the AI tool.</p><p><strong>Results: </strong>Significant positive responses emerged against a neutral baseline, with the following median scores: intrinsic motivation 16.5 of 20.0 (IQR 15.0-18.0; d=2.09, P<.001), system usability 12.0 of 15.0 (IQR 11.5-12.5; d=2.18, P<.001), and psychological safety 5.0 of 5.0 (IQR 5.0-5.0; d=4.78, P<.001). Fidelity (median score 6.0/10.0, IQR 5.2-7.0; d=-0.08, P=.02) and immersion (median score 8.5/15.0, IQR 7.0-9.8; d=0.25 P=.08) were moderately rated. The overall Immersive Technology Evaluation Measure scores showed a high positive learning experience: median 47.5 of 65.0 (IQR 43.0-51.2; d=2.00, P<.001). Qualitative analysis identified 3 major themes across 11 subthemes, with participants highlighting both technical limitations and educational value. Participants valued the safe practice environment and the ability to receive immediate feedback.</p><p><strong>Conclusions: </strong>AI VP technology shows promising potential for communication skills training despite the current realism limitations. While it does not yet match human standardized patient authenticity, the technology has achieved sufficient fidelity to support meaningful educational interactions, and this study identified clear areas for improvement. The integration of AI into medical curricula represents a promising avenue for innovation in medical edu","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"11 ","pages":"e70766"},"PeriodicalIF":3.2,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12551969/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145369013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuhang Lin, Zhiheng Luo, Zicheng Ye, Nuoxi Zhong, Lijian Zhao, Long Zhang, Xiaolan Li, Zetao Chen, Yijia Chen
<p><strong>Background: </strong>Nowadays, generative artificial intelligence (GAI) drives medical education toward enhanced intelligence, personalization, and interactivity. With its vast generative abilities and diverse applications, GAI redefines how educational resources are accessed, teaching methods are implemented, and assessments are conducted.</p><p><strong>Objective: </strong>This study aimed to review the current applications of GAI in medical education; analyze its opportunities and challenges; identify its strengths and potential issues in educational methods, assessments, and resources; and capture GAI's rapid evolution and multidimensional applications in medical education, thereby providing a theoretical foundation for future practice.</p><p><strong>Methods: </strong>This scoping review used PubMed, Web of Science, and Scopus to analyze literature from January 2023 to October 2024, focusing on GAI applications in medical education. Following PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines, 5991 articles were retrieved, with 1304 duplicates removed. The 2-stage screening (title or abstract and full-text review) excluded 4564 articles and a supplementary search included 8 articles, yielding 131 studies for final synthesis. We included (1) studies addressing GAI's applications, challenges, or future directions in medical education, (2) empirical research, systematic reviews, and meta-analyses, and (3) English-language articles. We excluded commentaries, editorials, viewpoints, perspectives, short reports, or communications with low levels of evidence, non-GAI technologies, and studies centered on other fields of medical education (eg, nursing). We integrated quantitative analysis of publication trends and Human Development Index (HDI) with thematic analysis of applications, technical limitations, and ethical implications.</p><p><strong>Results: </strong>Analysis of 131 articles revealed that 74.0% (n=97) originated from countries or regions with very high HDI, with the United States contributing the most (n=33); 14.5% (n=19) were from high HDI countries, 5.3% (n=7) from medium HDI countries, and 2.2% (n=3) from low HDI countries, with 3.8% (n=5) involving cross-HDI collaborations. ChatGPT was the most studied GAI model (n=119), followed by Gemini (n=22), Copilot (n=11), Claude (n=6), and LLaMA (n=4). Thematic analysis indicated that GAI applications in medical education mainly embody the diversification of educational methods, scientific evaluation of educational assessments, and dynamic optimization of educational resources. However, it also highlighted current limitations and potential future challenges, including insufficient scene adaptability, data quality and information bias, overreliance, and ethical controversies.</p><p><strong>Conclusions: </strong>GAI application in medical education exhibits significant regional disparities in development, and model r
{"title":"Applications, Challenges, and Prospects of Generative Artificial Intelligence Empowering Medical Education: Scoping Review.","authors":"Yuhang Lin, Zhiheng Luo, Zicheng Ye, Nuoxi Zhong, Lijian Zhao, Long Zhang, Xiaolan Li, Zetao Chen, Yijia Chen","doi":"10.2196/71125","DOIUrl":"10.2196/71125","url":null,"abstract":"<p><strong>Background: </strong>Nowadays, generative artificial intelligence (GAI) drives medical education toward enhanced intelligence, personalization, and interactivity. With its vast generative abilities and diverse applications, GAI redefines how educational resources are accessed, teaching methods are implemented, and assessments are conducted.</p><p><strong>Objective: </strong>This study aimed to review the current applications of GAI in medical education; analyze its opportunities and challenges; identify its strengths and potential issues in educational methods, assessments, and resources; and capture GAI's rapid evolution and multidimensional applications in medical education, thereby providing a theoretical foundation for future practice.</p><p><strong>Methods: </strong>This scoping review used PubMed, Web of Science, and Scopus to analyze literature from January 2023 to October 2024, focusing on GAI applications in medical education. Following PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines, 5991 articles were retrieved, with 1304 duplicates removed. The 2-stage screening (title or abstract and full-text review) excluded 4564 articles and a supplementary search included 8 articles, yielding 131 studies for final synthesis. We included (1) studies addressing GAI's applications, challenges, or future directions in medical education, (2) empirical research, systematic reviews, and meta-analyses, and (3) English-language articles. We excluded commentaries, editorials, viewpoints, perspectives, short reports, or communications with low levels of evidence, non-GAI technologies, and studies centered on other fields of medical education (eg, nursing). We integrated quantitative analysis of publication trends and Human Development Index (HDI) with thematic analysis of applications, technical limitations, and ethical implications.</p><p><strong>Results: </strong>Analysis of 131 articles revealed that 74.0% (n=97) originated from countries or regions with very high HDI, with the United States contributing the most (n=33); 14.5% (n=19) were from high HDI countries, 5.3% (n=7) from medium HDI countries, and 2.2% (n=3) from low HDI countries, with 3.8% (n=5) involving cross-HDI collaborations. ChatGPT was the most studied GAI model (n=119), followed by Gemini (n=22), Copilot (n=11), Claude (n=6), and LLaMA (n=4). Thematic analysis indicated that GAI applications in medical education mainly embody the diversification of educational methods, scientific evaluation of educational assessments, and dynamic optimization of educational resources. However, it also highlighted current limitations and potential future challenges, including insufficient scene adaptability, data quality and information bias, overreliance, and ethical controversies.</p><p><strong>Conclusions: </strong>GAI application in medical education exhibits significant regional disparities in development, and model r","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"11 ","pages":"e71125"},"PeriodicalIF":3.2,"publicationDate":"2025-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12547994/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145348932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Competency-based medical education relies heavily on high-quality narrative reflections and feedback within workplace-based assessments. However, evaluating these narratives at scale remains a significant challenge.
Objective: This study aims to develop and apply natural language processing (NLP) models to evaluate the quality of resident reflections and faculty feedback documented in Entrustable Professional Activities (EPAs) on Taiwan's nationwide Emyway platform for otolaryngology residency training.
Methods: This 4-year cross-sectional study analyzes 300 randomly sampled EPA assessments from 2021 to 2025, covering a pilot year and 3 full implementation years. Two medical education experts independently rated the narratives based on relevance, specificity, and the presence of reflective or improvement-focused language. Narratives were categorized into 4 quality levels-effective, moderate, ineffective, or irrelevant-and then dichotomized into high quality and low quality. We compared the performance of logistic regression, support vector machine, and bidirectional encoder representations from transformers (BERT) models in classifying narrative quality. The best performing model was then applied to track quality trends over time.
Results: The BERT model, a multilingual pretrained language model, outperformed other approaches, achieving 85% and 92% accuracy in binary classification for resident reflections and faculty feedback, respectively. The accuracy for the 4-level classification was 67% for both. Longitudinal analysis revealed significant increases in high-quality reflections (from 70.3% to 99.5%) and feedback (from 50.6% to 88.9%) over the study period.
Conclusions: BERT-based NLP demonstrated moderate-to-high accuracy in evaluating the narrative quality in EPA assessments, especially in the binary classification. While not a replacement for expert review, NLP models offer a valuable tool for monitoring narrative trends and enhancing formative feedback in competency-based medical education.
{"title":"Automated Evaluation of Reflection and Feedback Quality in Workplace-Based Assessments by Using Natural Language Processing: Cross-Sectional Competency-Based Medical Education Study.","authors":"Jeng-Wen Chen, Hai-Lun Tu, Chun-Hsiang Chang, Wei-Chung Hsu, Pa-Chun Wang, Chun-Hou Liao, Mingchih Chen","doi":"10.2196/81718","DOIUrl":"10.2196/81718","url":null,"abstract":"<p><strong>Background: </strong>Competency-based medical education relies heavily on high-quality narrative reflections and feedback within workplace-based assessments. However, evaluating these narratives at scale remains a significant challenge.</p><p><strong>Objective: </strong>This study aims to develop and apply natural language processing (NLP) models to evaluate the quality of resident reflections and faculty feedback documented in Entrustable Professional Activities (EPAs) on Taiwan's nationwide Emyway platform for otolaryngology residency training.</p><p><strong>Methods: </strong>This 4-year cross-sectional study analyzes 300 randomly sampled EPA assessments from 2021 to 2025, covering a pilot year and 3 full implementation years. Two medical education experts independently rated the narratives based on relevance, specificity, and the presence of reflective or improvement-focused language. Narratives were categorized into 4 quality levels-effective, moderate, ineffective, or irrelevant-and then dichotomized into high quality and low quality. We compared the performance of logistic regression, support vector machine, and bidirectional encoder representations from transformers (BERT) models in classifying narrative quality. The best performing model was then applied to track quality trends over time.</p><p><strong>Results: </strong>The BERT model, a multilingual pretrained language model, outperformed other approaches, achieving 85% and 92% accuracy in binary classification for resident reflections and faculty feedback, respectively. The accuracy for the 4-level classification was 67% for both. Longitudinal analysis revealed significant increases in high-quality reflections (from 70.3% to 99.5%) and feedback (from 50.6% to 88.9%) over the study period.</p><p><strong>Conclusions: </strong>BERT-based NLP demonstrated moderate-to-high accuracy in evaluating the narrative quality in EPA assessments, especially in the binary classification. While not a replacement for expert review, NLP models offer a valuable tool for monitoring narrative trends and enhancing formative feedback in competency-based medical education.</p>","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"11 ","pages":"e81718"},"PeriodicalIF":3.2,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12590046/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145348980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Stomatology education has experienced substantial transformations over recent decades. Nevertheless, a comprehensive summary encompassing the entirety of this field remains absent in the literature.
Objective: This study aimed to perform a bibliometric analysis to evaluate the research status, current focus, and emerging trends in this field over the last two decades.
Methods: We retrieved publications concerning teaching and learning in stomatology education from the Web of Science core collection covering the period from 2003 to 2023. Subsequently, we conducted a bibliometric analysis and visualization using R-Bibliometrix and CiteSpace.
Results: In total, 5528 publications focusing on teaching and learning in stomatology education were identified. The annual number of publications in this field has shown a consistent upward trend. The United States and the United Kingdom emerged as the leading contributors to research. Among academic institutions, the University of Iowa produced the highest number of publications. The Journal of Dental Education was identified as the journal with the highest citation. Wanchek T authored the most highly cited articles in the field. Emerging research hotspots were characterized by keywords such as "deep learning," "machine learning," "online learning," "virtual reality," and "convolutional neural network." The thematic map analysis further revealed that "surgery" and "accuracy" were categorized as emerging themes.
Conclusions: The visualization bibliometric analysis of the literature clearly depicts the current hotspots and emerging topics in stomatology education concerning teaching and learning. The findings are intended to serve as a reference to advance the development of stomatology education research globally.
背景:近几十年来,口腔医学教育经历了重大变革。然而,一个全面的总结,包括整个领域在文献中仍然缺席。目的:本研究旨在通过文献计量分析来评价近二十年来该领域的研究现状、研究热点和发展趋势。方法:检索Web of Science核心馆藏2003 - 2023年有关口腔医学教学的出版物。随后,我们使用R-Bibliometrix和CiteSpace进行了文献计量学分析和可视化。结果:共检索到口腔医学教学相关文献5528篇。这一领域的年度出版物数量呈现出持续上升的趋势。美国和英国成为研究的主要贡献者。在学术机构中,爱荷华大学发表的出版物数量最多。《牙科教育杂志》被确定为引用率最高的杂志。Wanchek T撰写了该领域被引用次数最多的文章。新兴研究热点以“深度学习”、“机器学习”、“在线学习”、“虚拟现实”和“卷积神经网络”等关键词为特征。专题地图分析进一步显示,“手术”和“准确性”被归类为新兴主题。结论:对文献进行可视化文献计量学分析,清晰地描绘了当前口腔医学教育中涉及教与学的热点和新兴课题。研究结果可为推动全球口腔医学教育研究的发展提供参考。
{"title":"Insights Into History and Trends of Teaching and Learning in Stomatology Education: Bibliometric Analysis.","authors":"Ziang Zou, Linna Guo","doi":"10.2196/66322","DOIUrl":"10.2196/66322","url":null,"abstract":"<p><strong>Background: </strong>Stomatology education has experienced substantial transformations over recent decades. Nevertheless, a comprehensive summary encompassing the entirety of this field remains absent in the literature.</p><p><strong>Objective: </strong>This study aimed to perform a bibliometric analysis to evaluate the research status, current focus, and emerging trends in this field over the last two decades.</p><p><strong>Methods: </strong>We retrieved publications concerning teaching and learning in stomatology education from the Web of Science core collection covering the period from 2003 to 2023. Subsequently, we conducted a bibliometric analysis and visualization using R-Bibliometrix and CiteSpace.</p><p><strong>Results: </strong>In total, 5528 publications focusing on teaching and learning in stomatology education were identified. The annual number of publications in this field has shown a consistent upward trend. The United States and the United Kingdom emerged as the leading contributors to research. Among academic institutions, the University of Iowa produced the highest number of publications. The Journal of Dental Education was identified as the journal with the highest citation. Wanchek T authored the most highly cited articles in the field. Emerging research hotspots were characterized by keywords such as \"deep learning,\" \"machine learning,\" \"online learning,\" \"virtual reality,\" and \"convolutional neural network.\" The thematic map analysis further revealed that \"surgery\" and \"accuracy\" were categorized as emerging themes.</p><p><strong>Conclusions: </strong>The visualization bibliometric analysis of the literature clearly depicts the current hotspots and emerging topics in stomatology education concerning teaching and learning. The findings are intended to serve as a reference to advance the development of stomatology education research globally.</p>","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"11 ","pages":"e66322"},"PeriodicalIF":3.2,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12536922/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145337635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Team performance is crucial in crisis situations. Although the Thai version of Team Strategies and Tools to Enhance Performance and Patient Safety (TeamSTEPPS) has been validated, challenges remain due to its subjective evaluation. To date, no studies have examined the relationship between electroencephalogram (EEG) activity and team performance, as assessed by TeamSTEPPS, during virtual simulation-based interprofessional education (SIMBIE), where face-to-face communication is absent.
Objective: This study aims to investigate the correlation between EEG-based brain-to-brain synchronization and TeamSTEPPS scores in multiprofessional teams participating in virtual SIMBIE sessions.
Methods: This single-center study involved 90 participants (15 groups of 6 simulated professionals: 1 medical doctor, 2 nurses, 1 pharmacist, 1 medical technologist, and 1 radiological technologist). Each group completed two 30-minute virtual SIMBIE sessions focusing on team training in a crisis situation involving COVID-19 pneumonia with a difficult airway, resulting in 30 sessions in total. The TeamSTEPPS scores of each participant across 5 domains were independently assessed by 2 trained raters based on screen recording, and their average values were used. The scores of participants in the same session were aggregated to generate a group TeamSTEPPS score, representing group-level performance. EEG data were recorded using wireless EEG acquisition devices and computed for total interdependence (TI), which represents brain-to-brain synchronization. The TI values of participants in the same session were aggregated to produce a group TI, representing group-level brain-to-brain synchronization. We investigated the Pearson correlations between the TI and the scores at both the group and individual levels.
Results: Interrater reliability for the TeamSTEPPS scores among 12 raters indicated good agreement on average (mean 0.73, SD 0.18; range 0.32-0.999). At the individual level, the Pearson correlations between the TI and the scores were weak and not statistically significant across all TeamSTEPPS domains (all adjusted P≥.05). However, strongly negative, statistically significant correlations between the group TI and the group TeamSTEPPS scores in the alpha frequency band (8-12 Hz) of the anterior brain area were found across all TeamSTEPPS domains after correcting for multiple comparisons (mean -0.87, SD 0.06; range -0.93 to -0.8).
Conclusions: Strong negative correlations between the group TI and the group TeamSTEPPS scores were observed in the anterior alpha activity during online hexad virtual SIMBIE. These findings suggest that anterior alpha TI may serve as an objective metric for assessing TeamSTEPPS-based team performance.
{"title":"Correlation Between Electroencephalogram Brain-to-Brain Synchronization and Team Strategies and Tools to Enhance Performance and Patient Safety Scores During Online Hexad Virtual Simulation-Based Interprofessional Education: Cross-Sectional Correlational Study.","authors":"Atthaphon Viriyopase, Khuansiri Narajeenron","doi":"10.2196/69725","DOIUrl":"10.2196/69725","url":null,"abstract":"<p><strong>Background: </strong>Team performance is crucial in crisis situations. Although the Thai version of Team Strategies and Tools to Enhance Performance and Patient Safety (TeamSTEPPS) has been validated, challenges remain due to its subjective evaluation. To date, no studies have examined the relationship between electroencephalogram (EEG) activity and team performance, as assessed by TeamSTEPPS, during virtual simulation-based interprofessional education (SIMBIE), where face-to-face communication is absent.</p><p><strong>Objective: </strong>This study aims to investigate the correlation between EEG-based brain-to-brain synchronization and TeamSTEPPS scores in multiprofessional teams participating in virtual SIMBIE sessions.</p><p><strong>Methods: </strong>This single-center study involved 90 participants (15 groups of 6 simulated professionals: 1 medical doctor, 2 nurses, 1 pharmacist, 1 medical technologist, and 1 radiological technologist). Each group completed two 30-minute virtual SIMBIE sessions focusing on team training in a crisis situation involving COVID-19 pneumonia with a difficult airway, resulting in 30 sessions in total. The TeamSTEPPS scores of each participant across 5 domains were independently assessed by 2 trained raters based on screen recording, and their average values were used. The scores of participants in the same session were aggregated to generate a group TeamSTEPPS score, representing group-level performance. EEG data were recorded using wireless EEG acquisition devices and computed for total interdependence (TI), which represents brain-to-brain synchronization. The TI values of participants in the same session were aggregated to produce a group TI, representing group-level brain-to-brain synchronization. We investigated the Pearson correlations between the TI and the scores at both the group and individual levels.</p><p><strong>Results: </strong>Interrater reliability for the TeamSTEPPS scores among 12 raters indicated good agreement on average (mean 0.73, SD 0.18; range 0.32-0.999). At the individual level, the Pearson correlations between the TI and the scores were weak and not statistically significant across all TeamSTEPPS domains (all adjusted P≥.05). However, strongly negative, statistically significant correlations between the group TI and the group TeamSTEPPS scores in the alpha frequency band (8-12 Hz) of the anterior brain area were found across all TeamSTEPPS domains after correcting for multiple comparisons (mean -0.87, SD 0.06; range -0.93 to -0.8).</p><p><strong>Conclusions: </strong>Strong negative correlations between the group TI and the group TeamSTEPPS scores were observed in the anterior alpha activity during online hexad virtual SIMBIE. These findings suggest that anterior alpha TI may serve as an objective metric for assessing TeamSTEPPS-based team performance.</p>","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"11 ","pages":"e69725"},"PeriodicalIF":3.2,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12583944/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145337560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Improving the quality of education in clinical settings requires an understanding of learners' experiences and learning processes. However, this is a significant burden on learners and educators. If learners' learning records could be automatically analyzed and their experiences could be visualized, this would enable real-time tracking of their progress. Large language models (LLMs) may be useful for this purpose, although their accuracy has not been sufficiently studied.
Objective: This study aimed to explore the accuracy of predicting the actual clinical experiences of medical students from their learning log data during clinical clerkship using LLMs.
Methods: This study was conducted at the Nagoya University School of Medicine. Learning log data from medical students participating in a clinical clerkship from April 22, 2024, to May 24, 2024, were used. The Model Core Curriculum for Medical Education was used as a template to extract experiences. OpenAI's ChatGPT was selected for this task after a comparison with other LLMs. Prompts were created using the learning log data and provided to ChatGPT to extract experiences, which were then listed. A web application using GPT-4-turbo was developed to automate this process. The accuracy of the extracted experiences was evaluated by comparing them with the corrected lists provided by the students.
Results: A total of 20 sixth-year medical students participated in this study, resulting in 40 datasets. The overall Jaccard index was 0.59 (95% CI 0.46-0.71), and the Cohen κ was 0.65 (95% CI 0.53-0.76). Overall sensitivity was 62.39% (95% CI 49.96%-74.81%), and specificity was 99.34% (95% CI 98.77%-99.92%). Category-specific performance varied: symptoms showed a sensitivity of 45.43% (95% CI 25.12%-65.75%) and specificity of 98.75% (95% CI 97.31%-100%), examinations showed a sensitivity of 46.76% (95% CI 25.67%-67.86%) and specificity of 98.84% (95% CI 97.81%-99.87%), and procedures achieved a sensitivity of 56.36% (95% CI 37.64%-75.08%) and specificity of 98.92% (95% CI 96.67%-100%). The results suggest that GPT-4-turbo accurately identified many of the actual experiences but missed some because of insufficient detail or a lack of student records.
Conclusions: This study demonstrated that LLMs such as GPT-4-turbo can predict clinical experiences from learning logs with high specificity but moderate sensitivity. Future improvements in AI models, providing feedback to medical students' learning logs and combining them with other data sources such as electronic medical records, may enhance the accuracy. Using artificial intelligence to analyze learning logs for assessment could reduce the burden on learners and educators while improving the quality of educational assessments in medical education.
背景:提高临床教育质量需要了解学习者的经历和学习过程。然而,这对学习者和教育者来说是一个沉重的负担。如果学习者的学习记录可以自动分析,他们的经验可以可视化,这将使实时跟踪他们的进步成为可能。大型语言模型(llm)可能对这一目的有用,尽管它们的准确性还没有得到充分的研究。目的:本研究旨在探讨利用LLMs从医学生临床见习学习日志数据预测医学生实际临床经验的准确性。方法:本研究在名古屋大学医学院进行。使用2024年4月22日至2024年5月24日参加临床实习的医学生的学习日志数据。以《医学教育核心课程示范》为模板提取经验。经过与其他llm的比较,我们选择OpenAI的ChatGPT来完成这个任务。使用学习日志数据创建提示,并提供给ChatGPT以提取经验,然后将其列出。开发了一个使用GPT-4-turbo的web应用程序来自动化此过程。通过将所提取的经验与学生提供的更正列表进行比较,来评估其准确性。结果:共有20名六年级医学生参与本研究,共获得40个数据集。总体Jaccard指数为0.59 (95% CI 0.46 ~ 0.71), Cohen κ为0.65 (95% CI 0.53 ~ 0.76)。总敏感性为62.39% (95% CI 49.96% ~ 74.81%),特异性为99.34% (95% CI 98.77% ~ 99.92%)。分类特异性表现不同:症状的敏感性为45.43% (95% CI 25.12%-65.75%),特异性为98.75% (95% CI 97.31%-100%),检查的敏感性为46.76% (95% CI 25.67%-67.86%),特异性为98.84% (95% CI 97.81%-99.87%),手术的敏感性为56.36% (95% CI 37.64%-75.08%),特异性为98.92% (95% CI 96.67%-100%)。结果表明,GPT-4-turbo准确地识别了许多实际经历,但由于细节不足或缺乏学生记录而遗漏了一些。结论:本研究表明,GPT-4-turbo等LLMs可以通过学习日志预测临床经验,特异性高,敏感性中等。AI模型的未来改进,为医学生的学习日志提供反馈,并将其与电子病历等其他数据源相结合,可能会提高准确性。利用人工智能分析学习日志进行评估可以减轻学习者和教育者的负担,同时提高医学教育中教育评估的质量。
{"title":"AI's Accuracy in Extracting Learning Experiences From Clinical Practice Logs: Observational Study.","authors":"Takeshi Kondo, Hiroshi Nishigori","doi":"10.2196/68697","DOIUrl":"10.2196/68697","url":null,"abstract":"<p><strong>Background: </strong>Improving the quality of education in clinical settings requires an understanding of learners' experiences and learning processes. However, this is a significant burden on learners and educators. If learners' learning records could be automatically analyzed and their experiences could be visualized, this would enable real-time tracking of their progress. Large language models (LLMs) may be useful for this purpose, although their accuracy has not been sufficiently studied.</p><p><strong>Objective: </strong>This study aimed to explore the accuracy of predicting the actual clinical experiences of medical students from their learning log data during clinical clerkship using LLMs.</p><p><strong>Methods: </strong>This study was conducted at the Nagoya University School of Medicine. Learning log data from medical students participating in a clinical clerkship from April 22, 2024, to May 24, 2024, were used. The Model Core Curriculum for Medical Education was used as a template to extract experiences. OpenAI's ChatGPT was selected for this task after a comparison with other LLMs. Prompts were created using the learning log data and provided to ChatGPT to extract experiences, which were then listed. A web application using GPT-4-turbo was developed to automate this process. The accuracy of the extracted experiences was evaluated by comparing them with the corrected lists provided by the students.</p><p><strong>Results: </strong>A total of 20 sixth-year medical students participated in this study, resulting in 40 datasets. The overall Jaccard index was 0.59 (95% CI 0.46-0.71), and the Cohen κ was 0.65 (95% CI 0.53-0.76). Overall sensitivity was 62.39% (95% CI 49.96%-74.81%), and specificity was 99.34% (95% CI 98.77%-99.92%). Category-specific performance varied: symptoms showed a sensitivity of 45.43% (95% CI 25.12%-65.75%) and specificity of 98.75% (95% CI 97.31%-100%), examinations showed a sensitivity of 46.76% (95% CI 25.67%-67.86%) and specificity of 98.84% (95% CI 97.81%-99.87%), and procedures achieved a sensitivity of 56.36% (95% CI 37.64%-75.08%) and specificity of 98.92% (95% CI 96.67%-100%). The results suggest that GPT-4-turbo accurately identified many of the actual experiences but missed some because of insufficient detail or a lack of student records.</p><p><strong>Conclusions: </strong>This study demonstrated that LLMs such as GPT-4-turbo can predict clinical experiences from learning logs with high specificity but moderate sensitivity. Future improvements in AI models, providing feedback to medical students' learning logs and combining them with other data sources such as electronic medical records, may enhance the accuracy. Using artificial intelligence to analyze learning logs for assessment could reduce the burden on learners and educators while improving the quality of educational assessments in medical education.</p>","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"11 ","pages":"e68697"},"PeriodicalIF":3.2,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12529426/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145303860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}