Background: Competency-based medical education relies heavily on high-quality narrative reflections and feedback within workplace-based assessments. However, evaluating these narratives at scale remains a significant challenge.
Objective: This study aims to develop and apply natural language processing (NLP) models to evaluate the quality of resident reflections and faculty feedback documented in Entrustable Professional Activities (EPAs) on Taiwan's nationwide Emyway platform for otolaryngology residency training.
Methods: This 4-year cross-sectional study analyzes 300 randomly sampled EPA assessments from 2021 to 2025, covering a pilot year and 3 full implementation years. Two medical education experts independently rated the narratives based on relevance, specificity, and the presence of reflective or improvement-focused language. Narratives were categorized into 4 quality levels-effective, moderate, ineffective, or irrelevant-and then dichotomized into high quality and low quality. We compared the performance of logistic regression, support vector machine, and bidirectional encoder representations from transformers (BERT) models in classifying narrative quality. The best performing model was then applied to track quality trends over time.
Results: The BERT model, a multilingual pretrained language model, outperformed other approaches, achieving 85% and 92% accuracy in binary classification for resident reflections and faculty feedback, respectively. The accuracy for the 4-level classification was 67% for both. Longitudinal analysis revealed significant increases in high-quality reflections (from 70.3% to 99.5%) and feedback (from 50.6% to 88.9%) over the study period.
Conclusions: BERT-based NLP demonstrated moderate-to-high accuracy in evaluating the narrative quality in EPA assessments, especially in the binary classification. While not a replacement for expert review, NLP models offer a valuable tool for monitoring narrative trends and enhancing formative feedback in competency-based medical education.
Background: Stomatology education has experienced substantial transformations over recent decades. Nevertheless, a comprehensive summary encompassing the entirety of this field remains absent in the literature.
Objective: This study aimed to perform a bibliometric analysis to evaluate the research status, current focus, and emerging trends in this field over the last two decades.
Methods: We retrieved publications concerning teaching and learning in stomatology education from the Web of Science core collection covering the period from 2003 to 2023. Subsequently, we conducted a bibliometric analysis and visualization using R-Bibliometrix and CiteSpace.
Results: In total, 5528 publications focusing on teaching and learning in stomatology education were identified. The annual number of publications in this field has shown a consistent upward trend. The United States and the United Kingdom emerged as the leading contributors to research. Among academic institutions, the University of Iowa produced the highest number of publications. The Journal of Dental Education was identified as the journal with the highest citation. Wanchek T authored the most highly cited articles in the field. Emerging research hotspots were characterized by keywords such as "deep learning," "machine learning," "online learning," "virtual reality," and "convolutional neural network." The thematic map analysis further revealed that "surgery" and "accuracy" were categorized as emerging themes.
Conclusions: The visualization bibliometric analysis of the literature clearly depicts the current hotspots and emerging topics in stomatology education concerning teaching and learning. The findings are intended to serve as a reference to advance the development of stomatology education research globally.
Background: Team performance is crucial in crisis situations. Although the Thai version of Team Strategies and Tools to Enhance Performance and Patient Safety (TeamSTEPPS) has been validated, challenges remain due to its subjective evaluation. To date, no studies have examined the relationship between electroencephalogram (EEG) activity and team performance, as assessed by TeamSTEPPS, during virtual simulation-based interprofessional education (SIMBIE), where face-to-face communication is absent.
Objective: This study aims to investigate the correlation between EEG-based brain-to-brain synchronization and TeamSTEPPS scores in multiprofessional teams participating in virtual SIMBIE sessions.
Methods: This single-center study involved 90 participants (15 groups of 6 simulated professionals: 1 medical doctor, 2 nurses, 1 pharmacist, 1 medical technologist, and 1 radiological technologist). Each group completed two 30-minute virtual SIMBIE sessions focusing on team training in a crisis situation involving COVID-19 pneumonia with a difficult airway, resulting in 30 sessions in total. The TeamSTEPPS scores of each participant across 5 domains were independently assessed by 2 trained raters based on screen recording, and their average values were used. The scores of participants in the same session were aggregated to generate a group TeamSTEPPS score, representing group-level performance. EEG data were recorded using wireless EEG acquisition devices and computed for total interdependence (TI), which represents brain-to-brain synchronization. The TI values of participants in the same session were aggregated to produce a group TI, representing group-level brain-to-brain synchronization. We investigated the Pearson correlations between the TI and the scores at both the group and individual levels.
Results: Interrater reliability for the TeamSTEPPS scores among 12 raters indicated good agreement on average (mean 0.73, SD 0.18; range 0.32-0.999). At the individual level, the Pearson correlations between the TI and the scores were weak and not statistically significant across all TeamSTEPPS domains (all adjusted P≥.05). However, strongly negative, statistically significant correlations between the group TI and the group TeamSTEPPS scores in the alpha frequency band (8-12 Hz) of the anterior brain area were found across all TeamSTEPPS domains after correcting for multiple comparisons (mean -0.87, SD 0.06; range -0.93 to -0.8).
Conclusions: Strong negative correlations between the group TI and the group TeamSTEPPS scores were observed in the anterior alpha activity during online hexad virtual SIMBIE. These findings suggest that anterior alpha TI may serve as an objective metric for assessing TeamSTEPPS-based team performance.
Background: Improving the quality of education in clinical settings requires an understanding of learners' experiences and learning processes. However, this is a significant burden on learners and educators. If learners' learning records could be automatically analyzed and their experiences could be visualized, this would enable real-time tracking of their progress. Large language models (LLMs) may be useful for this purpose, although their accuracy has not been sufficiently studied.
Objective: This study aimed to explore the accuracy of predicting the actual clinical experiences of medical students from their learning log data during clinical clerkship using LLMs.
Methods: This study was conducted at the Nagoya University School of Medicine. Learning log data from medical students participating in a clinical clerkship from April 22, 2024, to May 24, 2024, were used. The Model Core Curriculum for Medical Education was used as a template to extract experiences. OpenAI's ChatGPT was selected for this task after a comparison with other LLMs. Prompts were created using the learning log data and provided to ChatGPT to extract experiences, which were then listed. A web application using GPT-4-turbo was developed to automate this process. The accuracy of the extracted experiences was evaluated by comparing them with the corrected lists provided by the students.
Results: A total of 20 sixth-year medical students participated in this study, resulting in 40 datasets. The overall Jaccard index was 0.59 (95% CI 0.46-0.71), and the Cohen κ was 0.65 (95% CI 0.53-0.76). Overall sensitivity was 62.39% (95% CI 49.96%-74.81%), and specificity was 99.34% (95% CI 98.77%-99.92%). Category-specific performance varied: symptoms showed a sensitivity of 45.43% (95% CI 25.12%-65.75%) and specificity of 98.75% (95% CI 97.31%-100%), examinations showed a sensitivity of 46.76% (95% CI 25.67%-67.86%) and specificity of 98.84% (95% CI 97.81%-99.87%), and procedures achieved a sensitivity of 56.36% (95% CI 37.64%-75.08%) and specificity of 98.92% (95% CI 96.67%-100%). The results suggest that GPT-4-turbo accurately identified many of the actual experiences but missed some because of insufficient detail or a lack of student records.
Conclusions: This study demonstrated that LLMs such as GPT-4-turbo can predict clinical experiences from learning logs with high specificity but moderate sensitivity. Future improvements in AI models, providing feedback to medical students' learning logs and combining them with other data sources such as electronic medical records, may enhance the accuracy. Using artificial intelligence to analyze learning logs for assessment could reduce the burden on learners and educators while improving the quality of educational assessments in medical education.
Unlabelled: The release of GPT-4 Omni (GPT-4o), an advanced multimodal generative artificial intelligence (AI) model, generated substantial enthusiasm in the field of higher education. However, one year later, medical education continues to face significant challenges, demonstrating the need to move from initial experimentation with the integration of multimodal AIs in medical education toward meaningful integration. In this Viewpoint, we argue that GPT-4o's true value lies not in novelty, but in its potential to enhance training in communication skills, clinical reasoning, and procedural skills by offering real-time simulations and adaptive learning experiences using text, audio, and visual inputs in a safe, immersive, and cost-effective environment. We explore how this innovation has made it possible to address key medical educational challenges by simulating realistic patient interactions, offering personalized feedback, and reducing educator workloads and costs, where traditional teaching methods struggle to replicate the complexity and dynamism of real-world clinical scenarios. However, we also address the critical challenges of this approach, including data accuracy, bias, and ethical decision-making. Rather than seeing GPT-4o as a replacement, we propose its use as a strategic supplement, scaffolded into curriculum frameworks and evaluated through ongoing research. As the focus shifts from AI novelty to sustainable implementation, we call on educators, policymakers, and curriculum designers to establish governance mechanisms, pilot evaluation strategies, and develop faculty training. The future of AI in medical education depends not on the next breakthrough, but on how we integrate today's tools with intention and rigor.
Background: The integration of digital technologies is becoming increasingly essential in cancer care. However, limited digital health literacy among clinical and nonclinical cancer health care professionals poses significant challenges to effective implementation and sustainability over time. To address this, the European Union is prioritizing the development of targeted digital skills training programs for cancer care providers, the TRANSiTION project among them. A crucial initial step in this effort is conducting a comprehensive gap analysis to identify specific training needs.
Objective: The aim of this work is to identify training gaps and prioritize the digital skill development needs in the oncology health care workforce.
Methods: An importance-performance analysis (IPA) was conducted following a survey that assessed the performance and importance of 7 digital skills: information, communication, content creation, safety, eHealth problem-solving, ethics, and patient empowerment.
Results: A total of 67 participants from 11 European countries completed the study: 38 clinical professionals (CP), 16 nonclinical professionals (NCP), and 13 patients or caregivers (PC). CP acknowledged the need for a comprehensive training program that includes all 7 digital skills. Digital patient empowerment and safety skills emerge as the highest priorities for both CP and NCP. Conversely, NCP assigned a lower priority to digital content creation skills, and PC assigned a lower priority to digital information and ethical skills. The IPA also revealed discrepancies in digital communication skills across groups (H=6.50; P=.04).
Conclusions: The study showcased the pressing need for comprehensive digital skill training for cancer health care professionals across diverse backgrounds and health care systems in Europe, tailored to their occupation and care setting. Incorporating PC perspectives ensures a balanced approach to addressing these training gaps. These findings provide a valuable knowledge base for designing digital skills training programs, promoting a holistic approach that integrates the perspectives of the various stakeholders involved in digital cancer care.

