Pub Date : 2025-01-01Epub Date: 2025-01-20DOI: 10.3352/jeehp.2025.22.6
Masoud Dauda, Swabaha Aidarus Yusuph, Harouni Yasini, Issa Mmbaga, Perpetua Mwambinngu, Hansol Park, Gyeongbae Seo, Kyoung Kyun Oh
Purpose: This study evaluated the Dr LEE Jong-wook Fellowship Program’s impact on Tanzania’s health workforce, focusing on relevance, effectiveness, efficiency, impact, and sustainability in addressing healthcare gaps.
Methods: A mixed-methods research design was employed. Data were collected from 97 out of 140 alumni through an online survey, 35 in-depth interviews, and one focus group discussion. The study was conducted from November to December 2023 and included alumni from 2009 to 2022. Measurement instruments included structured questionnaires for quantitative data and semi-structured guides for qualitative data. Quantitative analysis involved descriptive and inferential statistics (Spearman’s rank correlation, non-parametric tests) using Python ver. 3.11.0 and Stata ver. 14.0. Thematic analysis was employed to analyze qualitative data using NVivo ver. 12.0.
Results: Findings indicated high relevance (mean=91.6, standard deviation [SD]=8.6), effectiveness (mean=86.1, SD=11.2), efficiency (mean=82.7, SD=10.2), and impact (mean=87.7, SD=9.9), with improved skills, confidence, and institutional service quality. However, sustainability had a lower score (mean=58.0, SD=11.1), reflecting challenges in follow-up support and resource allocation. Effectiveness strongly correlated with impact (ρ=0.746, P<0.001). The qualitative findings revealed that participants valued tailored training but highlighted barriers, such as language challenges and insufficient practical components. Alumni-led initiatives contributed to knowledge sharing, but limited resources constrained sustainability.
Conclusion: The Fellowship Program enhanced Tanzania’s health workforce capacity, but it requires localized curricula and strengthened alumni networks for sustainability. These findings provide actionable insights for improving similar programs globally, confirming the hypothesis that tailored training positively influences workforce and institutional outcomes.
{"title":"Empirical effect of the Dr LEE Jong-wook Fellowship Program to empower sustainable change for the health workforce in Tanzania: a mixed-methods study","authors":"Masoud Dauda, Swabaha Aidarus Yusuph, Harouni Yasini, Issa Mmbaga, Perpetua Mwambinngu, Hansol Park, Gyeongbae Seo, Kyoung Kyun Oh","doi":"10.3352/jeehp.2025.22.6","DOIUrl":"10.3352/jeehp.2025.22.6","url":null,"abstract":"<p><strong>Purpose: </strong>This study evaluated the Dr LEE Jong-wook Fellowship Program’s impact on Tanzania’s health workforce, focusing on relevance, effectiveness, efficiency, impact, and sustainability in addressing healthcare gaps.</p><p><strong>Methods: </strong>A mixed-methods research design was employed. Data were collected from 97 out of 140 alumni through an online survey, 35 in-depth interviews, and one focus group discussion. The study was conducted from November to December 2023 and included alumni from 2009 to 2022. Measurement instruments included structured questionnaires for quantitative data and semi-structured guides for qualitative data. Quantitative analysis involved descriptive and inferential statistics (Spearman’s rank correlation, non-parametric tests) using Python ver. 3.11.0 and Stata ver. 14.0. Thematic analysis was employed to analyze qualitative data using NVivo ver. 12.0.</p><p><strong>Results: </strong>Findings indicated high relevance (mean=91.6, standard deviation [SD]=8.6), effectiveness (mean=86.1, SD=11.2), efficiency (mean=82.7, SD=10.2), and impact (mean=87.7, SD=9.9), with improved skills, confidence, and institutional service quality. However, sustainability had a lower score (mean=58.0, SD=11.1), reflecting challenges in follow-up support and resource allocation. Effectiveness strongly correlated with impact (ρ=0.746, P<0.001). The qualitative findings revealed that participants valued tailored training but highlighted barriers, such as language challenges and insufficient practical components. Alumni-led initiatives contributed to knowledge sharing, but limited resources constrained sustainability.</p><p><strong>Conclusion: </strong>The Fellowship Program enhanced Tanzania’s health workforce capacity, but it requires localized curricula and strengthened alumni networks for sustainability. These findings provide actionable insights for improving similar programs globally, confirming the hypothesis that tailored training positively influences workforce and institutional outcomes.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"22 ","pages":"6"},"PeriodicalIF":9.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12003955/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143013022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-01-14DOI: 10.3352/jeehp.2025.22.3
Stella Anna Bult, Thomas van Gulik
Purpose: This research presents an experimental study using validated questionnaires to quantitatively assess the outcomes of art-based observational training in medical students, residents, and specialists. The study tested the hypothesis that art-based observational training would lead to measurable effects on judgement skills (tolerance of ambiguity) and empathy in medical students and doctors.
Methods: An experimental cohort study with pre- and post-intervention assessments was conducted using validated questionnaires and qualitative evaluation forms to examine the outcomes of art-based observational training in medical students and doctors. Between December 2023 and June 2024, 15 art courses were conducted in the Rijksmuseum in Amsterdam. Participants were assessed on empathy using the Jefferson Scale of Empathy (JSE) and tolerance of ambiguity using the Tolerance of Ambiguity in Medical Students and Doctors (TAMSAD) scale.
Results: In total, 91 participants were included; 29 participants completed the JSE and 62 completed the TAMSAD scales. The results showed statistically significant post-test increases for mean JSE and TAMSAD scores (3.71 points for the JSE, ranging from 20 to 140, and 1.86 points for the TAMSAD, ranging from 0 to 100). The qualitative findings were predominantly positive.
Conclusion: The results suggest that incorporating art-based observational training in medical education improves empathy and tolerance of ambiguity. This study highlights the importance of art-based observational training in medical education in the professional development of medical students and doctors.
{"title":"Empathy and tolerance of ambiguity in medical students and doctors participating in art-based observational training at the Rijksmuseum in Amsterdam, the Netherlands: a before-and-after study","authors":"Stella Anna Bult, Thomas van Gulik","doi":"10.3352/jeehp.2025.22.3","DOIUrl":"10.3352/jeehp.2025.22.3","url":null,"abstract":"<p><strong>Purpose: </strong>This research presents an experimental study using validated questionnaires to quantitatively assess the outcomes of art-based observational training in medical students, residents, and specialists. The study tested the hypothesis that art-based observational training would lead to measurable effects on judgement skills (tolerance of ambiguity) and empathy in medical students and doctors.</p><p><strong>Methods: </strong>An experimental cohort study with pre- and post-intervention assessments was conducted using validated questionnaires and qualitative evaluation forms to examine the outcomes of art-based observational training in medical students and doctors. Between December 2023 and June 2024, 15 art courses were conducted in the Rijksmuseum in Amsterdam. Participants were assessed on empathy using the Jefferson Scale of Empathy (JSE) and tolerance of ambiguity using the Tolerance of Ambiguity in Medical Students and Doctors (TAMSAD) scale.</p><p><strong>Results: </strong>In total, 91 participants were included; 29 participants completed the JSE and 62 completed the TAMSAD scales. The results showed statistically significant post-test increases for mean JSE and TAMSAD scores (3.71 points for the JSE, ranging from 20 to 140, and 1.86 points for the TAMSAD, ranging from 0 to 100). The qualitative findings were predominantly positive.</p><p><strong>Conclusion: </strong>The results suggest that incorporating art-based observational training in medical education improves empathy and tolerance of ambiguity. This study highlights the importance of art-based observational training in medical education in the professional development of medical students and doctors.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"22 ","pages":"3"},"PeriodicalIF":9.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11880821/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142980319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-07-28DOI: 10.3352/jeehp.2025.22.20
Wuan Shuen Yap, Pui San Saw, Li Ling Yeap, Shaun Wen Huey Lee, Wei Jin Wong, Ronald Fook Seng Lee
Purpose: Manual grading is time-consuming and prone to inconsistencies, prompting the exploration of generative artificial intelligence tools such as GPT-4 to enhance efficiency and reliability. This study investigated GPT-4's potential in grading pharmacy students' exam responses, focusing on the impact of optimized prompts. Specifically, it evaluated the alignment between GPT-4 and human raters, assessed GPT-4's consistency over time, and determined its error rates in grading pharmacy students' exam responses.
Methods: We conducted a comparative study using past exam responses graded by university-trained raters and by GPT-4. Responses were randomized before evaluation by GPT-4, accessed via a Plus account between April and September 2024. Prompt optimization was performed on 16 responses, followed by evaluation of 3 prompt delivery methods. We then applied the optimized approach across 4 item types. Intraclass correlation coefficients and error analyses were used to assess consistency and agreement between GPT-4 and human ratings.
Results: GPT-4's ratings aligned reasonably well with human raters, demonstrating moderate to excellent reliability (intraclass correlation coefficient=0.617-0.933), depending on item type and the optimized prompt. When stratified by grade bands, GPT-4 was less consistent in marking high-scoring responses (Z=-5.71-4.62, P<0.001). Overall, despite achieving substantial alignment with human raters in many cases, discrepancies across item types and a tendency to commit basic errors necessitate continued educator involvement to ensure grading accuracy.
Conclusion: With optimized prompts, GPT-4 shows promise as a supportive tool for grading pharmacy students' exam responses, particularly for objective tasks. However, its limitations-including errors and variability in grading high-scoring responses-require ongoing human oversight. Future research should explore advanced generative artificial intelligence models and broader assessment formats to further enhance grading reliability.
{"title":"Comparison between GPT-4 and human raters in grading pharmacy students' exam responses in Malaysia: a cross-sectional study.","authors":"Wuan Shuen Yap, Pui San Saw, Li Ling Yeap, Shaun Wen Huey Lee, Wei Jin Wong, Ronald Fook Seng Lee","doi":"10.3352/jeehp.2025.22.20","DOIUrl":"https://doi.org/10.3352/jeehp.2025.22.20","url":null,"abstract":"<p><strong>Purpose: </strong>Manual grading is time-consuming and prone to inconsistencies, prompting the exploration of generative artificial intelligence tools such as GPT-4 to enhance efficiency and reliability. This study investigated GPT-4's potential in grading pharmacy students' exam responses, focusing on the impact of optimized prompts. Specifically, it evaluated the alignment between GPT-4 and human raters, assessed GPT-4's consistency over time, and determined its error rates in grading pharmacy students' exam responses.</p><p><strong>Methods: </strong>We conducted a comparative study using past exam responses graded by university-trained raters and by GPT-4. Responses were randomized before evaluation by GPT-4, accessed via a Plus account between April and September 2024. Prompt optimization was performed on 16 responses, followed by evaluation of 3 prompt delivery methods. We then applied the optimized approach across 4 item types. Intraclass correlation coefficients and error analyses were used to assess consistency and agreement between GPT-4 and human ratings.</p><p><strong>Results: </strong>GPT-4's ratings aligned reasonably well with human raters, demonstrating moderate to excellent reliability (intraclass correlation coefficient=0.617-0.933), depending on item type and the optimized prompt. When stratified by grade bands, GPT-4 was less consistent in marking high-scoring responses (Z=-5.71-4.62, P<0.001). Overall, despite achieving substantial alignment with human raters in many cases, discrepancies across item types and a tendency to commit basic errors necessitate continued educator involvement to ensure grading accuracy.</p><p><strong>Conclusion: </strong>With optimized prompts, GPT-4 shows promise as a supportive tool for grading pharmacy students' exam responses, particularly for objective tasks. However, its limitations-including errors and variability in grading high-scoring responses-require ongoing human oversight. Future research should explore advanced generative artificial intelligence models and broader assessment formats to further enhance grading reliability.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"22 ","pages":"20"},"PeriodicalIF":3.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145151398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-06-10DOI: 10.3352/jeehp.2025.22.18
Yulim Kang, Hae Won Kim
Purpose: This study investigated the longitudinal relationships between performance on 3 examinations assessing medical knowledge and clinical skills among Korean medical students in the clinical phase. This study addressed the stability of each examination score and the interrelationships among examinations over time.
Methods: A retrospective longitudinal study was conducted at Yonsei University College of Medicine in Korea with a cohort of 112 medical students over 2 years. The students were in their third year in 2022 and progressed to the fourth year in 2023. We obtained comprehensive clinical science examination (CCSE) and progress test (PT) scores 3 times (T1-T3), and clinical performance examination (CPX) scores twice (T1 and T2). Autoregressive cross-lagged models were fitted to analyze their relationships.
Results: For each of the 3 examinations, the score at 1 time point predicted the subsequent score. Regarding cross-lagged effects, the CCSE at T1 predicted PT at T2 (β=0.472, P<0.001) and CCSE at T2 predicted PT at T3 (β=0.527, P<0.001). The CPX at T1 predicted the CCSE at T2 (β=0.163, P=0.006), and the CPX at T2 predicted the CCSE at T3 (β=0.154, P=0.006). The PT at T1 predicted the CPX at T2 (β=0.273, P=0.006).
Conclusion: The study identified each examination's stability and the complexity of the longitudinal relationships between them. These findings may help predict medical students' performance on subsequent examinations, potentially informing the provision of necessary student support.
{"title":"Longitudinal relationships between Korean medical students' academic performance in medical knowledge and clinical performance examinations: a retrospective longitudinal study.","authors":"Yulim Kang, Hae Won Kim","doi":"10.3352/jeehp.2025.22.18","DOIUrl":"10.3352/jeehp.2025.22.18","url":null,"abstract":"<p><strong>Purpose: </strong>This study investigated the longitudinal relationships between performance on 3 examinations assessing medical knowledge and clinical skills among Korean medical students in the clinical phase. This study addressed the stability of each examination score and the interrelationships among examinations over time.</p><p><strong>Methods: </strong>A retrospective longitudinal study was conducted at Yonsei University College of Medicine in Korea with a cohort of 112 medical students over 2 years. The students were in their third year in 2022 and progressed to the fourth year in 2023. We obtained comprehensive clinical science examination (CCSE) and progress test (PT) scores 3 times (T1-T3), and clinical performance examination (CPX) scores twice (T1 and T2). Autoregressive cross-lagged models were fitted to analyze their relationships.</p><p><strong>Results: </strong>For each of the 3 examinations, the score at 1 time point predicted the subsequent score. Regarding cross-lagged effects, the CCSE at T1 predicted PT at T2 (β=0.472, P<0.001) and CCSE at T2 predicted PT at T3 (β=0.527, P<0.001). The CPX at T1 predicted the CCSE at T2 (β=0.163, P=0.006), and the CPX at T2 predicted the CCSE at T3 (β=0.154, P=0.006). The PT at T1 predicted the CPX at T2 (β=0.273, P=0.006).</p><p><strong>Conclusion: </strong>The study identified each examination's stability and the complexity of the longitudinal relationships between them. These findings may help predict medical students' performance on subsequent examinations, potentially informing the provision of necessary student support.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"22 ","pages":"18"},"PeriodicalIF":3.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12365683/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144267588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-09-26DOI: 10.3352/jeehp.2025.22.26
Sean Gallivan, Jamie Bayliss
Purpose: The aim of this study was to assess the validity of the Student and Clinical Instructor Performance Instrument (SCIPAI), a novel formative tool used in physical therapist education to assess student and clinical instructor (CI) performance throughout clinical education experiences (CEEs). The researchers hypothesized that the SCIPAI would demonstrate concurrent, predictive, and construct validity while offering additional contemporary validity evidence.
Methods: This quasi-experimental, time-series study had 811 student-CI pairs complete 2 SCIPAIs before after CEE midpoint, and an endpoint Clinical Performance Instrument (CPI) during beginning to terminal CEEs in a 1-year period. Spearman rank correlation analyses used final SCIPAI and CPI like-item scores to assess concurrent validity; and earlier SCIPAI and final CPI like-item scores to assess predictive validity. Construct validity was assessed via progression of student and CI performance scores within CEEs using Wilcoxon signed-rank testing. No randomization/grouping of subjects occurred.
Results: Moderate correlation existed between like final SCIPAI and CPI items (P<0.005) and between some like items of earlier SCIPAIs and final CPIs (P<0.005). Student performance scores demonstrated progress from SCIPAIs 1 to 4 within CEEs (P<0.005). While a greater number of CIs demonstrated progression rather than regression in performance from SCIPAI 1 to SCIPAI 4, the greater magnitude of decreases in CI performance contributed to an aggregate ratings decrease of CI performance (P<0.005).
Conclusion: The SCIPAI demonstrates concurrent, predictive, and construct validity when used by students and CIs to rate student performance at regular points throughout clinical education experiences.
{"title":"Validity of the formative physical therapy Student and Clinical Instructor Performance Instrument in the United States: a quasi-experimental, time-series study.","authors":"Sean Gallivan, Jamie Bayliss","doi":"10.3352/jeehp.2025.22.26","DOIUrl":"10.3352/jeehp.2025.22.26","url":null,"abstract":"<p><strong>Purpose: </strong>The aim of this study was to assess the validity of the Student and Clinical Instructor Performance Instrument (SCIPAI), a novel formative tool used in physical therapist education to assess student and clinical instructor (CI) performance throughout clinical education experiences (CEEs). The researchers hypothesized that the SCIPAI would demonstrate concurrent, predictive, and construct validity while offering additional contemporary validity evidence.</p><p><strong>Methods: </strong>This quasi-experimental, time-series study had 811 student-CI pairs complete 2 SCIPAIs before after CEE midpoint, and an endpoint Clinical Performance Instrument (CPI) during beginning to terminal CEEs in a 1-year period. Spearman rank correlation analyses used final SCIPAI and CPI like-item scores to assess concurrent validity; and earlier SCIPAI and final CPI like-item scores to assess predictive validity. Construct validity was assessed via progression of student and CI performance scores within CEEs using Wilcoxon signed-rank testing. No randomization/grouping of subjects occurred.</p><p><strong>Results: </strong>Moderate correlation existed between like final SCIPAI and CPI items (P<0.005) and between some like items of earlier SCIPAIs and final CPIs (P<0.005). Student performance scores demonstrated progress from SCIPAIs 1 to 4 within CEEs (P<0.005). While a greater number of CIs demonstrated progression rather than regression in performance from SCIPAI 1 to SCIPAI 4, the greater magnitude of decreases in CI performance contributed to an aggregate ratings decrease of CI performance (P<0.005).</p><p><strong>Conclusion: </strong>The SCIPAI demonstrates concurrent, predictive, and construct validity when used by students and CIs to rate student performance at regular points throughout clinical education experiences.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"22 ","pages":"26"},"PeriodicalIF":3.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12688320/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145150958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-05-12DOI: 10.3352/jeehp.2025.22.15
Sofia Barlocco De La Vega, Evelyne Guerif-Dubreucq, Jebrane Bouaoud, Myriam Awad, Léonard Mathon, Agathe Beauvais, Thomas Olivier, Pierre-Clément Thiébaud, Anne-Laure Philippon
Purpose: To compare the effectiveness of mixed reality with traditional manikin-based simulation in basic life support (BLS) training, testing the hypothesis that mixed reality is non-inferior to manikin-based simulation.
Methods: A non-inferiority randomized controlled trial was conducted. Third-year medical students were randomized into 2 groups. The mixed reality group received 32 minutes of individual training using a virtual reality headset and a torso for chest compressions (CC). The manikin group participated in 2 hours of group training consisting of theoretical and practical sessions using a low-fidelity manikin. The primary outcome was the overall BLS performance score, assessed at 1 month through a standardized BLS scenario using a 10-item assessment scale. The quality of CC, student satisfaction, and confidence levels were secondary outcomes and assessed through superiority analyses.
Results: Data from 155 participants were analyzed, with 84 in the mixed reality group and 71 in the manikin group. The mean overall BLS performance score was 6.4 (mixed reality) vs. 6.5 (manikin), (mean difference, -0.1; 95% confidence interval [CI], -0.45 to +∞). CC depth was greater in the manikin group (50.3 mm vs. 46.6 mm; mean difference, -3.7 mm; 95% CI, -6.5 to -0.9), with 61.2% achieving optimal depth compared to 43.8% in the mixed reality group (mean difference, 17.4%; 95% CI, -29.3 to -5.5). Satisfaction was higher in the mixed reality group (4.9/5 vs. 4.7/5 in the manikin group; difference, 0.2; 95% CI, 0.07 to 0.33), as was confidence in performing BLS (3.9/5 vs. 3.6/5; difference, 0.3; 95% CI, 0.11 to 0.58). No other significant differences were observed for secondary outcomes.
Conclusion: Mixed reality is non-inferior to manikin simulation in terms of overall BLS performance score assessed at 1 month.
目的:比较混合现实与传统基于人体模型的模拟在基本生命支持(BLS)训练中的有效性,验证混合现实不劣于基于人体模型的模拟的假设。方法:采用非劣效性随机对照试验。三年级医学生随机分为两组。混合现实组接受了32分钟的个人训练,使用虚拟现实耳机和躯干胸部按压(CC)。人体模型组使用低保真度的人体模型进行了2小时的理论和实践训练。主要结果是总体BLS表现得分,在1个月时通过标准化的BLS情景使用10项评估量表进行评估。CC的质量、学生满意度和信心水平是次要结果,并通过优势分析进行评估。结果:共分析了155名参与者的数据,其中混合现实组84人,人体模型组71人。BLS的平均总分为6.4分(混合现实)vs. 6.5分(人体模型),(平均差-0.1;95%置信区间[CI], -0.45至+∞)。假人组CC深度更大(50.3 mm vs 46.6 mm;平均差值-3.7 mm;95% CI, -6.5至-0.9),61.2%达到最佳深度,而混合现实组为43.8%(平均差为17.4%;95% CI, -29.3至-5.5)。混合现实组满意度更高(4.9/5 vs. 4.7/5);差异,0.2;95% CI, 0.07至0.33),执行BLS的信心也是如此(3.9/5 vs. 3.6/5;差异,0.3;95% CI, 0.11 ~ 0.58)。在次要结果方面没有观察到其他显著差异。结论:混合现实在1个月的综合BLS性能评分方面不低于人体模拟。
{"title":"Mixed reality versus manikins in basic life support simulation-based training for medical students in France: the mixed reality non-inferiority randomized controlled trial.","authors":"Sofia Barlocco De La Vega, Evelyne Guerif-Dubreucq, Jebrane Bouaoud, Myriam Awad, Léonard Mathon, Agathe Beauvais, Thomas Olivier, Pierre-Clément Thiébaud, Anne-Laure Philippon","doi":"10.3352/jeehp.2025.22.15","DOIUrl":"10.3352/jeehp.2025.22.15","url":null,"abstract":"<p><strong>Purpose: </strong>To compare the effectiveness of mixed reality with traditional manikin-based simulation in basic life support (BLS) training, testing the hypothesis that mixed reality is non-inferior to manikin-based simulation.</p><p><strong>Methods: </strong>A non-inferiority randomized controlled trial was conducted. Third-year medical students were randomized into 2 groups. The mixed reality group received 32 minutes of individual training using a virtual reality headset and a torso for chest compressions (CC). The manikin group participated in 2 hours of group training consisting of theoretical and practical sessions using a low-fidelity manikin. The primary outcome was the overall BLS performance score, assessed at 1 month through a standardized BLS scenario using a 10-item assessment scale. The quality of CC, student satisfaction, and confidence levels were secondary outcomes and assessed through superiority analyses.</p><p><strong>Results: </strong>Data from 155 participants were analyzed, with 84 in the mixed reality group and 71 in the manikin group. The mean overall BLS performance score was 6.4 (mixed reality) vs. 6.5 (manikin), (mean difference, -0.1; 95% confidence interval [CI], -0.45 to +∞). CC depth was greater in the manikin group (50.3 mm vs. 46.6 mm; mean difference, -3.7 mm; 95% CI, -6.5 to -0.9), with 61.2% achieving optimal depth compared to 43.8% in the mixed reality group (mean difference, 17.4%; 95% CI, -29.3 to -5.5). Satisfaction was higher in the mixed reality group (4.9/5 vs. 4.7/5 in the manikin group; difference, 0.2; 95% CI, 0.07 to 0.33), as was confidence in performing BLS (3.9/5 vs. 3.6/5; difference, 0.3; 95% CI, 0.11 to 0.58). No other significant differences were observed for secondary outcomes.</p><p><strong>Conclusion: </strong>Mixed reality is non-inferior to manikin simulation in terms of overall BLS performance score assessed at 1 month.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"22 ","pages":"15"},"PeriodicalIF":9.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144040345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-06-19DOI: 10.3352/jeehp.2025.22.19
Uzma Khan, Yasir Naseem Khan
Purpose: This study investigated the correlation between task-based checklist scores and global rating scores (GRS) in objective structured clinical examinations (OSCEs) for fourth-year undergraduate medical students and aimed to determine whether both methods can be reliably used in a standard setting.
Methods: A comparative observational study was conducted at Al Rayan College of Medicine, Saudi Arabia, involving 93 fourth-year students during the 2023-2024 academic year. OSCEs from 2 General Practice courses were analyzed, each comprising 10 stations assessing clinical competencies. Students were scored using both task-specific checklists and holistic 5-point GRS. Reliability was evaluated using Cronbach's α, and the relationship between the 2 scoring methods was assessed using the coefficient of determination (R2). Ethical approval and informed consent were obtained.
Results: The mean OSCE score was 76.7 in Course 1 (Cronbach's α=0.85) and 73.0 in Course 2 (Cronbach's α=0.81). R2 values varied by station and competency. Strong correlations were observed in procedural and management skills (R2 up to 0.87), while weaker correlations appeared in history-taking stations (R2 as low as 0.35). The variability across stations highlighted the context-dependence of alignment between checklist and GRS methods.
Conclusion: Both checklists and GRS exhibit reliable psychometric properties. Their combined use improves validity in OSCE scoring, but station-specific application is recommended. Checklists may anchor pass/fail decisions, while GRS may assist in assessing borderline performance. This hybrid model increases fairness and reflects clinical authenticity in competency-based assessment.
{"title":"Correlation between task-based checklists and global rating scores in undergraduate objective structured clinical examinations in Saudi Arabia: a 1-year comparative study.","authors":"Uzma Khan, Yasir Naseem Khan","doi":"10.3352/jeehp.2025.22.19","DOIUrl":"10.3352/jeehp.2025.22.19","url":null,"abstract":"<p><strong>Purpose: </strong>This study investigated the correlation between task-based checklist scores and global rating scores (GRS) in objective structured clinical examinations (OSCEs) for fourth-year undergraduate medical students and aimed to determine whether both methods can be reliably used in a standard setting.</p><p><strong>Methods: </strong>A comparative observational study was conducted at Al Rayan College of Medicine, Saudi Arabia, involving 93 fourth-year students during the 2023-2024 academic year. OSCEs from 2 General Practice courses were analyzed, each comprising 10 stations assessing clinical competencies. Students were scored using both task-specific checklists and holistic 5-point GRS. Reliability was evaluated using Cronbach's α, and the relationship between the 2 scoring methods was assessed using the coefficient of determination (R2). Ethical approval and informed consent were obtained.</p><p><strong>Results: </strong>The mean OSCE score was 76.7 in Course 1 (Cronbach's α=0.85) and 73.0 in Course 2 (Cronbach's α=0.81). R2 values varied by station and competency. Strong correlations were observed in procedural and management skills (R2 up to 0.87), while weaker correlations appeared in history-taking stations (R2 as low as 0.35). The variability across stations highlighted the context-dependence of alignment between checklist and GRS methods.</p><p><strong>Conclusion: </strong>Both checklists and GRS exhibit reliable psychometric properties. Their combined use improves validity in OSCE scoring, but station-specific application is recommended. Checklists may anchor pass/fail decisions, while GRS may assist in assessing borderline performance. This hybrid model increases fairness and reflects clinical authenticity in competency-based assessment.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"22 ","pages":"19"},"PeriodicalIF":3.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12365684/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144776471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-09-30DOI: 10.3352/jeehp.2025.22.28
Han Ting Jillian Yeo, Dujeepa Dasharatha Samarasekera, Michael Dean
Purpose: Variability in examiner scoring threatens the fairness and reliability of objective structured clinical examinations (OSCEs). While examiner standardization exists, there is currently no structured, psychometric-informed, individualized feedback mechanism for examiners. This study explored the feasibility and perceived value of such a mechanism using an action research approach to co-design and iteratively refine examiner feedback reports.
Methods: Two exploratory cycles were conducted between November 2023 and June 2024 with phase 4 OSCE examiners at the Yong Loo Lin School of Medicine. In cycle 1, psychometric analyses of examiner scoring for a phase 4 OSCE informed the design of individualized reports, which were evaluated through interviews. Revisions were made to the format of the report and implemented in cycle 2, where examiner responses were again collected. Data were analyzed thematically, supported by reflective logs and field notes.
Results: Nine examiners participated in cycle 1 and 7 in cycle 2. In cycle 1, examiners highlighted challenges in interpreting complex terminology, leading to report refinements such as glossaries and visual graphs. In cycle 2, examiners demonstrated greater confidence in applying feedback, requested longitudinal reports, and shifted from initial resistance to reflective engagement. Across cycles, the reports improved credibility, neutrality, and examiner self-regulation.
Conclusion: This exploratory study suggests that psychometric-informed feedback reports can facilitate examiner reflection and transparency in OSCEs. While the findings highlight feasibility and examiner acceptance, longitudinal delivery of feedback, collection of quantitative outcome data, and larger samples are needed to establish whether such reports improve scoring consistency and assessment fairness.
{"title":"Leveraging feedback mechanisms to improve the quality of objective structured clinical examinations in Singapore: an exploratory action research study.","authors":"Han Ting Jillian Yeo, Dujeepa Dasharatha Samarasekera, Michael Dean","doi":"10.3352/jeehp.2025.22.28","DOIUrl":"10.3352/jeehp.2025.22.28","url":null,"abstract":"<p><strong>Purpose: </strong>Variability in examiner scoring threatens the fairness and reliability of objective structured clinical examinations (OSCEs). While examiner standardization exists, there is currently no structured, psychometric-informed, individualized feedback mechanism for examiners. This study explored the feasibility and perceived value of such a mechanism using an action research approach to co-design and iteratively refine examiner feedback reports.</p><p><strong>Methods: </strong>Two exploratory cycles were conducted between November 2023 and June 2024 with phase 4 OSCE examiners at the Yong Loo Lin School of Medicine. In cycle 1, psychometric analyses of examiner scoring for a phase 4 OSCE informed the design of individualized reports, which were evaluated through interviews. Revisions were made to the format of the report and implemented in cycle 2, where examiner responses were again collected. Data were analyzed thematically, supported by reflective logs and field notes.</p><p><strong>Results: </strong>Nine examiners participated in cycle 1 and 7 in cycle 2. In cycle 1, examiners highlighted challenges in interpreting complex terminology, leading to report refinements such as glossaries and visual graphs. In cycle 2, examiners demonstrated greater confidence in applying feedback, requested longitudinal reports, and shifted from initial resistance to reflective engagement. Across cycles, the reports improved credibility, neutrality, and examiner self-regulation.</p><p><strong>Conclusion: </strong>This exploratory study suggests that psychometric-informed feedback reports can facilitate examiner reflection and transparency in OSCEs. While the findings highlight feasibility and examiner acceptance, longitudinal delivery of feedback, collection of quantitative outcome data, and larger samples are needed to establish whether such reports improve scoring consistency and assessment fairness.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"22 ","pages":"28"},"PeriodicalIF":3.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12768547/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145193038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-09-08DOI: 10.3352/jeehp.2025.22.25
Janghee Park, Mi Kyoung Yim, Sujin Shin, Rhayun Song, Jun-Ah Song, Inyoung Lee, Heejeong Kim, Minjae Lee
Purpose: The Korean Nursing Licensing Examination (KNLE) is planning to transition to a computer-based test (CBT). This study aims to propose a reasonable and efficient method for setting passing scores.
Methods: A standard setting (passing score setting) analysis was conducted using an expert panel over the past 3 years of the national nursing examination. The standard-setting method was modified from Angoff, and the validity of the passing score was verified through the Hofstee method. The standard-setting workshop was conducted in 2 stages: first, a pilot workshop for 2 subjects, followed by a second workshop where 6 additional subjects were selected based on the pilot results. For items with an actual correct answer rate of 90% or higher, the estimated correct answer rate for minimum competency was calculated using the observed correct answer rate. A survey and discussion with the expert panel were also conducted regarding the standard-setting procedures and results.
Results: The passing score for the national nursing examination was calculated using the new method, and the score was slightly higher than the existing score. The nursing subject had similar results,; however, the legal subjects varied.
Conclusion: The modified Angoff and Hofstee methods were successfully applied to the KNLE. Using the actual correct answer rate as an indicator to derive expected minimum competency was shown to be effective. This approach could streamline future standard-setting processes, particularly when converting to CBT.
{"title":"Proposal for setting a passing score for the Korean Nursing Licensing Examination.","authors":"Janghee Park, Mi Kyoung Yim, Sujin Shin, Rhayun Song, Jun-Ah Song, Inyoung Lee, Heejeong Kim, Minjae Lee","doi":"10.3352/jeehp.2025.22.25","DOIUrl":"10.3352/jeehp.2025.22.25","url":null,"abstract":"<p><strong>Purpose: </strong>The Korean Nursing Licensing Examination (KNLE) is planning to transition to a computer-based test (CBT). This study aims to propose a reasonable and efficient method for setting passing scores.</p><p><strong>Methods: </strong>A standard setting (passing score setting) analysis was conducted using an expert panel over the past 3 years of the national nursing examination. The standard-setting method was modified from Angoff, and the validity of the passing score was verified through the Hofstee method. The standard-setting workshop was conducted in 2 stages: first, a pilot workshop for 2 subjects, followed by a second workshop where 6 additional subjects were selected based on the pilot results. For items with an actual correct answer rate of 90% or higher, the estimated correct answer rate for minimum competency was calculated using the observed correct answer rate. A survey and discussion with the expert panel were also conducted regarding the standard-setting procedures and results.</p><p><strong>Results: </strong>The passing score for the national nursing examination was calculated using the new method, and the score was slightly higher than the existing score. The nursing subject had similar results,; however, the legal subjects varied.</p><p><strong>Conclusion: </strong>The modified Angoff and Hofstee methods were successfully applied to the KNLE. Using the actual correct answer rate as an indicator to derive expected minimum competency was shown to be effective. This approach could streamline future standard-setting processes, particularly when converting to CBT.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"22 ","pages":"25"},"PeriodicalIF":3.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145150586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose: This study aimed to adapt and validate the Albanian version of the Genomic Nursing Concept Inventory (GNCI) and to assess the level of genomic literacy among nursing and midwifery students.
Methods: Data were collected via a monocentric online cross-sectional study using the Albanian version of the GNCI. Participants included first-, second-, and third-year nursing and midwifery students. Demographic data such as age, sex, year level, and prior exposure to genetics were collected. The Kruskal-Wallis, Mann-Whitney U, and chi-square tests were used to compare demographic characteristics and GNCI scores between groups.
Results: Among the 715 participants, most were female (88.5%) with a median age of 19 years. Most respondents (65%) had not taken a genetics course, and 83.5% had not attended any related training. The mean score was 7.49, corresponding to a scale difficulty of 24.38% correct responses.
Conclusion: The findings reveal a low foundational knowledge of genetics/genomics among future nurses and midwives. It is essential to enhance learning strategies and update curricula to prepare a competent healthcare workforce in precision health.
{"title":"Assessing genetic and genomic literacy concepts among Albanian nursing and midwifery students: a cross-sectional study.","authors":"Elona Gaxhja, Mitilda Gugu, Angelo Dante, Armelda Teta, Armela Kapaj, Liljana Ramasaco","doi":"10.3352/jeehp.2025.22.13","DOIUrl":"https://doi.org/10.3352/jeehp.2025.22.13","url":null,"abstract":"<p><strong>Purpose: </strong>This study aimed to adapt and validate the Albanian version of the Genomic Nursing Concept Inventory (GNCI) and to assess the level of genomic literacy among nursing and midwifery students.</p><p><strong>Methods: </strong>Data were collected via a monocentric online cross-sectional study using the Albanian version of the GNCI. Participants included first-, second-, and third-year nursing and midwifery students. Demographic data such as age, sex, year level, and prior exposure to genetics were collected. The Kruskal-Wallis, Mann-Whitney U, and chi-square tests were used to compare demographic characteristics and GNCI scores between groups.</p><p><strong>Results: </strong>Among the 715 participants, most were female (88.5%) with a median age of 19 years. Most respondents (65%) had not taken a genetics course, and 83.5% had not attended any related training. The mean score was 7.49, corresponding to a scale difficulty of 24.38% correct responses.</p><p><strong>Conclusion: </strong>The findings reveal a low foundational knowledge of genetics/genomics among future nurses and midwives. It is essential to enhance learning strategies and update curricula to prepare a competent healthcare workforce in precision health.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"22 ","pages":"13"},"PeriodicalIF":9.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144250211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}