Purpose: The coronavirus disease 2019 (COVID-19) pandemic limited healthcare professional education and training opportunities in rural communities. Because the US Department of Veterans Affairs (VA) has robust programs to train clinicians in the United States, this study examined VA trainee perspectives regarding pandemic-related training in rural and urban areas and interest in future employment with the VA.
Methods: Survey responses were collected nationally from VA physicians and nursing trainees before and after COVID-19 (2018 to 2021). Logistic regression models were used to test the association between pandemic timing (pre-pandemic or pandemic), trainee program (physician or nurse), and the interaction of trainee pandemic timing and program on VA trainee satisfaction and trainee likelihood to consider future VA employment in rural and urban areas.
Results: While physician trainees at urban facilities reported decreases in overall training satisfaction and corresponding decreases in the likelihood of considering future VA employment from pre-pandemic to pandemic, rural physician trainees showed no changes in either outcome. In contrast, while nursing trainees at both urban and rural sites had decreases in training satisfaction associated with the pandemic, there was no corresponding effect on the likelihood of future employment by nurses at either urban or rural VA sites.
Conclusion: The study’s findings suggest differences in the training experiences of physicians and nurses at rural sites, as well as between physician trainees at urban and rural sites. Understanding these nuances can inform the development of targeted approaches to address the ongoing provider shortages that rural communities in the United States are facing.
{"title":"Training satisfaction and future employment consideration among physician and nursing trainees at rural Veterans Affairs facilities in the United States during COVID-19: a time-series before and after study","authors":"Heather Northcraft, Tiffany Radcliff, Anne Reid Griffin, Jia Bai, Aram Dobalian","doi":"10.3352/jeehp.2024.21.25","DOIUrl":"10.3352/jeehp.2024.21.25","url":null,"abstract":"<p><strong>Purpose: </strong>The coronavirus disease 2019 (COVID-19) pandemic limited healthcare professional education and training opportunities in rural communities. Because the US Department of Veterans Affairs (VA) has robust programs to train clinicians in the United States, this study examined VA trainee perspectives regarding pandemic-related training in rural and urban areas and interest in future employment with the VA.</p><p><strong>Methods: </strong>Survey responses were collected nationally from VA physicians and nursing trainees before and after COVID-19 (2018 to 2021). Logistic regression models were used to test the association between pandemic timing (pre-pandemic or pandemic), trainee program (physician or nurse), and the interaction of trainee pandemic timing and program on VA trainee satisfaction and trainee likelihood to consider future VA employment in rural and urban areas.</p><p><strong>Results: </strong>While physician trainees at urban facilities reported decreases in overall training satisfaction and corresponding decreases in the likelihood of considering future VA employment from pre-pandemic to pandemic, rural physician trainees showed no changes in either outcome. In contrast, while nursing trainees at both urban and rural sites had decreases in training satisfaction associated with the pandemic, there was no corresponding effect on the likelihood of future employment by nurses at either urban or rural VA sites.</p><p><strong>Conclusion: </strong>The study’s findings suggest differences in the training experiences of physicians and nurses at rural sites, as well as between physician trainees at urban and rural sites. Understanding these nuances can inform the development of targeted approaches to address the ongoing provider shortages that rural communities in the United States are facing.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"21 ","pages":"25"},"PeriodicalIF":9.3,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11528153/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142356105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01Epub Date: 2024-10-02DOI: 10.3352/jeehp.2024.21.27
Minkyung Oh, Bo Young Yoon
Purpose: The Dr. LEE Jong-wook Fellowship Program, established by the Korea Foundation for International Healthcare (KOFIH), aims to strengthen healthcare capacity in partner countries. The aim of the study was to develop new performance evaluation indicators for the program to better assess long-term educational impact across various courses and professional roles.
Methods: A 3-stage process was employed. First, a literature review of established evaluation models (Kirkpatrick’s 4 levels, context/input/process/product evaluation model, Organization for Economic Cooperation and Development Assistance Committee criteria) was conducted to devise evaluation criteria. Second, these criteria were validated via a 2-round Delphi survey with 18 experts in training projects from May 2021 to June 2021. Third, the relative importance of the evaluation criteria was determined using the analytic hierarchy process (AHP), calculating weights and ensuring consistency through the consistency index and consistency ratio (CR), with CR values below 0.1 indicating acceptable consistency.
Results: The literature review led to a combined evaluation model, resulting in 4 evaluation areas, 20 items, and 92 indicators. The Delphi surveys confirmed the validity of these indicators, with content validity ratio values exceeding 0.444. The AHP analysis assigned weights to each indicator, and CR values below 0.1 indicated consistency. The final set of evaluation indicators was confirmed through a workshop with KOFIH and adopted as the new evaluation tool.
Conclusion: The developed evaluation framework provides a comprehensive tool for assessing the long-term outcomes of the Dr. LEE Jong-wook Fellowship Program. It enhances evaluation capabilities and supports improvements in the training program’s effectiveness and international healthcare collaboration.
{"title":"A new performance evaluation indicator for the LEE Jong-wook Fellowship Program of Korea Foundation for International Healthcare to better assess its long-term educational impacts: a Delphi study.","authors":"Minkyung Oh, Bo Young Yoon","doi":"10.3352/jeehp.2024.21.27","DOIUrl":"10.3352/jeehp.2024.21.27","url":null,"abstract":"<p><strong>Purpose: </strong>The Dr. LEE Jong-wook Fellowship Program, established by the Korea Foundation for International Healthcare (KOFIH), aims to strengthen healthcare capacity in partner countries. The aim of the study was to develop new performance evaluation indicators for the program to better assess long-term educational impact across various courses and professional roles.</p><p><strong>Methods: </strong>A 3-stage process was employed. First, a literature review of established evaluation models (Kirkpatrick’s 4 levels, context/input/process/product evaluation model, Organization for Economic Cooperation and Development Assistance Committee criteria) was conducted to devise evaluation criteria. Second, these criteria were validated via a 2-round Delphi survey with 18 experts in training projects from May 2021 to June 2021. Third, the relative importance of the evaluation criteria was determined using the analytic hierarchy process (AHP), calculating weights and ensuring consistency through the consistency index and consistency ratio (CR), with CR values below 0.1 indicating acceptable consistency.</p><p><strong>Results: </strong>The literature review led to a combined evaluation model, resulting in 4 evaluation areas, 20 items, and 92 indicators. The Delphi surveys confirmed the validity of these indicators, with content validity ratio values exceeding 0.444. The AHP analysis assigned weights to each indicator, and CR values below 0.1 indicated consistency. The final set of evaluation indicators was confirmed through a workshop with KOFIH and adopted as the new evaluation tool.</p><p><strong>Conclusion: </strong>The developed evaluation framework provides a comprehensive tool for assessing the long-term outcomes of the Dr. LEE Jong-wook Fellowship Program. It enhances evaluation capabilities and supports improvements in the training program’s effectiveness and international healthcare collaboration.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"21 ","pages":"27"},"PeriodicalIF":9.3,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11535579/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142366885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01Epub Date: 2024-04-02DOI: 10.3352/jeehp.2024.21.8
Yoonjung Lee, Min-jung Lee, Junmoo Ahn, Chungwon Ha, Ye Ji Kang, Cheol Woong Jung, Dong-Mi Yoo, Jihye Yu, Seung-Hee Lee
Purpose: This study aimed to identify challenges and potential improvements in Korea’s medical education accreditation process according to the Accreditation Standards of the Korean Institute of Medical Education and Evaluation 2019 (ASK2019). Meta-evaluation was conducted to survey the experiences and perceptions of stakeholders, including self-assessment committee members, site visit committee members, administrative staff, and medical school professors.
Methods: A cross-sectional study was conducted using surveys sent to 40 medical schools. The 332 participants included self-assessment committee members, site visit team members, administrative staff, and medical school professors. The t-test, one-way analysis of variance and the chi-square test were used to analyze and compare opinions on medical education accreditation between the categories of participants.
Results: Site visit committee members placed greater importance on the necessity of accreditation than faculty members. A shared positive view on accreditation’s role in improving educational quality was seen among self-evaluation committee members and professors. Administrative staff highly regarded the Korean Institute of Medical Education and Evaluation’s reliability and objectivity, unlike the self-evaluation committee members. Site visit committee members positively perceived the clarity of accreditation standards, differing from self-assessment committee members. Administrative staff were most optimistic about implementing standards. However, the accreditation process encountered challenges, especially in duplicating content and preparing self-evaluation reports. Finally, perceptions regarding the accuracy of final site visit reports varied significantly between the self-evaluation committee members and the site visit committee members.
Conclusion: This study revealed diverse views on medical education accreditation, highlighting the need for improved communication, expectation alignment, and stakeholder collaboration to refine the accreditation process and quality.
目的:本研究旨在根据《2019 年韩国医学教育与评估研究院评审标准》(ASK2019),确定韩国医学教育评审过程中的挑战和潜在改进措施。通过元评价,调查了包括自评委员会成员、现场考察委员会成员、行政人员和医学教授在内的利益相关者的经验和看法:向 40 所医学院校发送了调查问卷,开展了一项横断面研究。332名参与者包括自评委员会成员、现场考察小组成员、行政人员和医学院教授。采用 t 检验、单因素方差分析和卡方检验来分析和比较各类参与者对医学教育评审的意见:结果:现场考察委员会成员比教职员工更重视评审的必要性。自我评估委员会成员和教授都对评审在提高教育质量方面的作用持积极态度。与自我评估委员会成员不同,行政人员高度评价韩国医学教育与评价院的可靠性和客观性。实地考察评估人员对评审标准的清晰度持肯定态度,这一点与自我评估委员会成员不同。行政人员对标准的实施最为乐观。然而,评审过程遇到了挑战,特别是在重复内容和编写自我评估报告方面。最后,自我评估委员会成员和现场考察委员会成员对最终现场考察报告准确性的看法存在很大差异:本研究揭示了对医学教育评审的不同看法,强调了加强沟通、调整期望值和利益相关者合作以完善评审过程和提高评审质量的必要性。
{"title":"Challenges and potential improvements in the Accreditation Standards of the Korean Institute of Medical Education and Evaluation 2019 (ASK2019) derived through meta-evaluation: a cross-sectional study","authors":"Yoonjung Lee, Min-jung Lee, Junmoo Ahn, Chungwon Ha, Ye Ji Kang, Cheol Woong Jung, Dong-Mi Yoo, Jihye Yu, Seung-Hee Lee","doi":"10.3352/jeehp.2024.21.8","DOIUrl":"10.3352/jeehp.2024.21.8","url":null,"abstract":"<p><strong>Purpose: </strong>This study aimed to identify challenges and potential improvements in Korea’s medical education accreditation process according to the Accreditation Standards of the Korean Institute of Medical Education and Evaluation 2019 (ASK2019). Meta-evaluation was conducted to survey the experiences and perceptions of stakeholders, including self-assessment committee members, site visit committee members, administrative staff, and medical school professors.</p><p><strong>Methods: </strong>A cross-sectional study was conducted using surveys sent to 40 medical schools. The 332 participants included self-assessment committee members, site visit team members, administrative staff, and medical school professors. The t-test, one-way analysis of variance and the chi-square test were used to analyze and compare opinions on medical education accreditation between the categories of participants.</p><p><strong>Results: </strong>Site visit committee members placed greater importance on the necessity of accreditation than faculty members. A shared positive view on accreditation’s role in improving educational quality was seen among self-evaluation committee members and professors. Administrative staff highly regarded the Korean Institute of Medical Education and Evaluation’s reliability and objectivity, unlike the self-evaluation committee members. Site visit committee members positively perceived the clarity of accreditation standards, differing from self-assessment committee members. Administrative staff were most optimistic about implementing standards. However, the accreditation process encountered challenges, especially in duplicating content and preparing self-evaluation reports. Finally, perceptions regarding the accuracy of final site visit reports varied significantly between the self-evaluation committee members and the site visit committee members.</p><p><strong>Conclusion: </strong>This study revealed diverse views on medical education accreditation, highlighting the need for improved communication, expectation alignment, and stakeholder collaboration to refine the accreditation process and quality.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"21 ","pages":"8"},"PeriodicalIF":4.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11108703/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140337062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01Epub Date: 2024-07-08DOI: 10.3352/jeehp.2024.21.17
Max Samuel Yudovich, Elizaveta Makarova, Christian Michael Hague, Jay Dilip Raman
Purpose: This study aimed to evaluate the performance of Chat Generative Pre-Trained Transformer (ChatGPT) with respect to standardized urology multiple-choice items in the United States.
Methods: In total, 700 multiple-choice urology board exam-style items were submitted to GPT-3.5 and GPT-4, and responses were recorded. Items were categorized based on topic and question complexity (recall, interpretation, and problem-solving). The accuracy of GPT-3.5 and GPT-4 was compared across item types in February 2024.
Results: GPT-4 answered 44.4% of items correctly compared to 30.9% for GPT-3.5 (P>0.0001). GPT-4 (vs. GPT-3.5) had higher accuracy with urologic oncology (43.8% vs. 33.9%, P=0.03), sexual medicine (44.3% vs. 27.8%, P=0.046), and pediatric urology (47.1% vs. 27.1%, P=0.012) items. Endourology (38.0% vs. 25.7%, P=0.15), reconstruction and trauma (29.0% vs. 21.0%, P=0.41), and neurourology (49.0% vs. 33.3%, P=0.11) items did not show significant differences in performance across versions. GPT-4 also outperformed GPT-3.5 with respect to recall (45.9% vs. 27.4%, P<0.00001), interpretation (45.6% vs. 31.5%, P=0.0005), and problem-solving (41.8% vs. 34.5%, P=0.56) type items. This difference was not significant for the higher-complexity items.
Conclusion: s: ChatGPT performs relatively poorly on standardized multiple-choice urology board exam-style items, with GPT-4 outperforming GPT-3.5. The accuracy was below the proposed minimum passing standards for the American Board of Urology's Continuing Urologic Certification knowledge reinforcement activity (60%). As artificial intelligence progresses in complexity, ChatGPT may become more capable and accurate with respect to board examination items. For now, its responses should be scrutinized.
目的:本研究旨在评估 Chat Generative Pre-Trained Transformer(ChatGPT)在美国标准化泌尿科选择题方面的性能:共向 GPT-3.5 和 GPT-4 提交了 700 个泌尿外科委员会考试类型的多项选择题,并记录了答案。根据题目和问题复杂程度(回忆、解释和解决问题)对项目进行分类。2024 年 2 月,比较了 GPT-3.5 和 GPT-4 在不同项目类型中的准确性:结果:GPT-4 回答正确率为 44.4%,而 GPT-3.5 为 30.9%(P>0.0001)。GPT-4(vs.GPT-3.5)在泌尿肿瘤学(43.8% vs. 33.9%,P=0.03)、性医学(44.3% vs. 27.8%,P=0.046)和小儿泌尿学(47.1% vs. 27.1%,P=0.012)项目上的准确率更高。内泌尿学(38.0% vs. 25.7%,P=0.15)、重建与创伤(29.0% vs. 21.0%,P=0.41)和神经泌尿学(49.0% vs. 33.3%,P=0.11)项目在不同版本中的表现没有显著差异。在回忆率方面,GPT-4 也优于 GPT-3.5(45.9% 对 27.4%,P=0.41):ChatGPT 在标准化的泌尿外科医师资格考试多选题上的表现相对较差,GPT-4 的表现优于 GPT-3.5。准确率低于美国泌尿外科委员会泌尿外科继续认证知识强化活动的最低合格标准(60%)。随着人工智能复杂性的提高,ChatGPT 在委员会考试项目方面的能力和准确性可能会越来越高。就目前而言,应该对它的回答进行仔细检查。
{"title":"Performance of GPT-3.5 and GPT-4 on standardized urology knowledge assessment items in the United States: a descriptive study.","authors":"Max Samuel Yudovich, Elizaveta Makarova, Christian Michael Hague, Jay Dilip Raman","doi":"10.3352/jeehp.2024.21.17","DOIUrl":"https://doi.org/10.3352/jeehp.2024.21.17","url":null,"abstract":"<p><strong>Purpose: </strong>This study aimed to evaluate the performance of Chat Generative Pre-Trained Transformer (ChatGPT) with respect to standardized urology multiple-choice items in the United States.</p><p><strong>Methods: </strong>In total, 700 multiple-choice urology board exam-style items were submitted to GPT-3.5 and GPT-4, and responses were recorded. Items were categorized based on topic and question complexity (recall, interpretation, and problem-solving). The accuracy of GPT-3.5 and GPT-4 was compared across item types in February 2024.</p><p><strong>Results: </strong>GPT-4 answered 44.4% of items correctly compared to 30.9% for GPT-3.5 (P>0.0001). GPT-4 (vs. GPT-3.5) had higher accuracy with urologic oncology (43.8% vs. 33.9%, P=0.03), sexual medicine (44.3% vs. 27.8%, P=0.046), and pediatric urology (47.1% vs. 27.1%, P=0.012) items. Endourology (38.0% vs. 25.7%, P=0.15), reconstruction and trauma (29.0% vs. 21.0%, P=0.41), and neurourology (49.0% vs. 33.3%, P=0.11) items did not show significant differences in performance across versions. GPT-4 also outperformed GPT-3.5 with respect to recall (45.9% vs. 27.4%, P<0.00001), interpretation (45.6% vs. 31.5%, P=0.0005), and problem-solving (41.8% vs. 34.5%, P=0.56) type items. This difference was not significant for the higher-complexity items.</p><p><strong>Conclusion: </strong>s: ChatGPT performs relatively poorly on standardized multiple-choice urology board exam-style items, with GPT-4 outperforming GPT-3.5. The accuracy was below the proposed minimum passing standards for the American Board of Urology's Continuing Urologic Certification knowledge reinforcement activity (60%). As artificial intelligence progresses in complexity, ChatGPT may become more capable and accurate with respect to board examination items. For now, its responses should be scrutinized.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"21 ","pages":"17"},"PeriodicalIF":9.3,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141560038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose: This study investigated the effect of simulation-based training on nursing students’ problem-solving skills, critical thinking skills, and self-efficacy.
Methods: A single-group pretest and posttest study was conducted among 173 second-year nursing students at a public university in Vietnam from May 2021 to July 2022. Each student participated in the adult nursing preclinical practice course, which utilized a moderate-fidelity simulation teaching approach. Instruments including the Personal Problem-Solving Inventory Scale, Critical Thinking Skills Questionnaire, and General Self-Efficacy Questionnaire were employed to measure participants’ problem-solving skills, critical thinking skills, and self-efficacy. Data were analyzed using descriptive statistics and the paired-sample t-test with the significance level set at P<0.05.
Results: The mean score of the Personal Problem-Solving Inventory posttest (127.24±12.11) was lower than the pretest score (131.42±16.95), suggesting an improvement in the problem-solving skills of the participants (t172 =2.55, P=0.011). There was no statistically significant difference in critical thinking skills between the pretest and posttest (P=0.854). Self-efficacy among nursing students showed a substantial increase from the pretest (27.91±5.26) to the posttest (28.71±3.81), with t172 =-2.26 and P=0.025.
Conclusion: The results suggest that simulation-based training can improve problem-solving skills and increase self-efficacy among nursing students. Therefore, the integration of simulation-based training in nursing education is recommended.
{"title":"The effect of simulation-based training on problem-solving skills, critical thinking skills, and self-efficacy among nursing students in Vietnam: a before-and-after study.","authors":"Tran Thi Hoang Oanh, Luu Thi Thuy, Ngo Thi Thu Huyen","doi":"10.3352/jeehp.2024.21.24","DOIUrl":"10.3352/jeehp.2024.21.24","url":null,"abstract":"<p><strong>Purpose: </strong>This study investigated the effect of simulation-based training on nursing students’ problem-solving skills, critical thinking skills, and self-efficacy.</p><p><strong>Methods: </strong>A single-group pretest and posttest study was conducted among 173 second-year nursing students at a public university in Vietnam from May 2021 to July 2022. Each student participated in the adult nursing preclinical practice course, which utilized a moderate-fidelity simulation teaching approach. Instruments including the Personal Problem-Solving Inventory Scale, Critical Thinking Skills Questionnaire, and General Self-Efficacy Questionnaire were employed to measure participants’ problem-solving skills, critical thinking skills, and self-efficacy. Data were analyzed using descriptive statistics and the paired-sample t-test with the significance level set at P<0.05.</p><p><strong>Results: </strong>The mean score of the Personal Problem-Solving Inventory posttest (127.24±12.11) was lower than the pretest score (131.42±16.95), suggesting an improvement in the problem-solving skills of the participants (t172 =2.55, P=0.011). There was no statistically significant difference in critical thinking skills between the pretest and posttest (P=0.854). Self-efficacy among nursing students showed a substantial increase from the pretest (27.91±5.26) to the posttest (28.71±3.81), with t172 =-2.26 and P=0.025.</p><p><strong>Conclusion: </strong>The results suggest that simulation-based training can improve problem-solving skills and increase self-efficacy among nursing students. Therefore, the integration of simulation-based training in nursing education is recommended.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"21 ","pages":"24"},"PeriodicalIF":9.3,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11480641/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142298256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01Epub Date: 2024-03-15DOI: 10.3352/jeehp.2024.21.6
Xiaojun Xu, Yixiao Chen, Jing Miao
Background: ChatGPT is a large language model (LLM) based on artificial intelligence (AI) capable of responding in multiple languages and generating nuanced and highly complex responses. While ChatGPT holds promising applications in medical education, its limitations and potential risks cannot be ignored.
Methods: A scoping review was conducted for English articles discussing ChatGPT in the context of medical education published after 2022. A literature search was performed using PubMed/MEDLINE, Embase, and Web of Science databases, and information was extracted from the relevant studies that were ultimately included.
Results: ChatGPT exhibits various potential applications in medical education, such as providing personalized learning plans and materials, creating clinical practice simulation scenarios, and assisting in writing articles. However, challenges associated with academic integrity, data accuracy, and potential harm to learning were also highlighted in the literature. The paper emphasizes certain recommendations for using ChatGPT, including the establishment of guidelines. Based on the review, 3 key research areas were proposed: cultivating the ability of medical students to use ChatGPT correctly, integrating ChatGPT into teaching activities and processes, and proposing standards for the use of AI by medical students.
Conclusion: ChatGPT has the potential to transform medical education, but careful consideration is required for its full integration. To harness the full potential of ChatGPT in medical education, attention should not only be given to the capabilities of AI but also to its impact on students and teachers.
{"title":"Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review","authors":"Xiaojun Xu, Yixiao Chen, Jing Miao","doi":"10.3352/jeehp.2024.21.6","DOIUrl":"10.3352/jeehp.2024.21.6","url":null,"abstract":"<p><strong>Background: </strong>ChatGPT is a large language model (LLM) based on artificial intelligence (AI) capable of responding in multiple languages and generating nuanced and highly complex responses. While ChatGPT holds promising applications in medical education, its limitations and potential risks cannot be ignored.</p><p><strong>Methods: </strong>A scoping review was conducted for English articles discussing ChatGPT in the context of medical education published after 2022. A literature search was performed using PubMed/MEDLINE, Embase, and Web of Science databases, and information was extracted from the relevant studies that were ultimately included.</p><p><strong>Results: </strong>ChatGPT exhibits various potential applications in medical education, such as providing personalized learning plans and materials, creating clinical practice simulation scenarios, and assisting in writing articles. However, challenges associated with academic integrity, data accuracy, and potential harm to learning were also highlighted in the literature. The paper emphasizes certain recommendations for using ChatGPT, including the establishment of guidelines. Based on the review, 3 key research areas were proposed: cultivating the ability of medical students to use ChatGPT correctly, integrating ChatGPT into teaching activities and processes, and proposing standards for the use of AI by medical students.</p><p><strong>Conclusion: </strong>ChatGPT has the potential to transform medical education, but careful consideration is required for its full integration. To harness the full potential of ChatGPT in medical education, attention should not only be given to the capabilities of AI but also to its impact on students and teachers.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"21 ","pages":"6"},"PeriodicalIF":4.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11035906/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140132845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01Epub Date: 2024-02-28DOI: 10.3352/jeehp.2024.21.5
Anna Therese Cianciolo, Heeyoung Han, Lydia Anne Howes, Debra Lee Klamen, Sophia Matos
Purpose: We examined United States medical students’ self-reported feedback encounters during clerkship training to better understand in situ feedback practices. Specifically, we asked: Who do students receive feedback from, about what, when, where, and how do they use it? We explored whether curricular expectations for preceptors’ written commentary aligned with feedback as it occurs naturalistically in the workplace.
Methods: This study occurred from July 2021 to February 2022 at Southern Illinois University School of Medicine. We used qualitative survey-based experience sampling to gather students’ accounts of their feedback encounters in 8 core specialties. We analyzed the who, what, when, where, and why of 267 feedback encounters reported by 11 clerkship students over 30 weeks. Code frequencies were mapped qualitatively to explore patterns in feedback encounters.
Results: Clerkship feedback occurs in patterns apparently related to the nature of clinical work in each specialty. These patterns may be attributable to each specialty’s “social learning ecosystem”—the distinctive learning environment shaped by the social and material aspects of a given specialty’s work, which determine who preceptors are, what students do with preceptors, and what skills or attributes matter enough to preceptors to comment on.
Conclusion: Comprehensive, standardized expectations for written feedback across specialties conflict with the reality of workplace-based learning. Preceptors may be better able—and more motivated—to document student performance that occurs as a natural part of everyday work. Nurturing social learning ecosystems could facilitate workplace-based learning such that, across specialties, students acquire a comprehensive clinical skillset appropriate for graduation.
{"title":"Discovering social learning ecosystems during clinical clerkship from United States medical students’ feedback encounters: a content analysis.","authors":"Anna Therese Cianciolo, Heeyoung Han, Lydia Anne Howes, Debra Lee Klamen, Sophia Matos","doi":"10.3352/jeehp.2024.21.5","DOIUrl":"10.3352/jeehp.2024.21.5","url":null,"abstract":"<p><strong>Purpose: </strong>We examined United States medical students’ self-reported feedback encounters during clerkship training to better understand in situ feedback practices. Specifically, we asked: Who do students receive feedback from, about what, when, where, and how do they use it? We explored whether curricular expectations for preceptors’ written commentary aligned with feedback as it occurs naturalistically in the workplace.</p><p><strong>Methods: </strong>This study occurred from July 2021 to February 2022 at Southern Illinois University School of Medicine. We used qualitative survey-based experience sampling to gather students’ accounts of their feedback encounters in 8 core specialties. We analyzed the who, what, when, where, and why of 267 feedback encounters reported by 11 clerkship students over 30 weeks. Code frequencies were mapped qualitatively to explore patterns in feedback encounters.</p><p><strong>Results: </strong>Clerkship feedback occurs in patterns apparently related to the nature of clinical work in each specialty. These patterns may be attributable to each specialty’s “social learning ecosystem”—the distinctive learning environment shaped by the social and material aspects of a given specialty’s work, which determine who preceptors are, what students do with preceptors, and what skills or attributes matter enough to preceptors to comment on.</p><p><strong>Conclusion: </strong>Comprehensive, standardized expectations for written feedback across specialties conflict with the reality of workplace-based learning. Preceptors may be better able—and more motivated—to document student performance that occurs as a natural part of everyday work. Nurturing social learning ecosystems could facilitate workplace-based learning such that, across specialties, students acquire a comprehensive clinical skillset appropriate for graduation.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"21 ","pages":"5"},"PeriodicalIF":4.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10948917/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139984162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01Epub Date: 2024-11-18DOI: 10.3352/jeehp.2024.21.33
Matthew Jian Wen Low, Gene Wai Han Chan, Zisheng Li, Yiwen Koh, Chi Loong Jen, Zi Yao Lee, Lenard Tai Win Cheng
Purpose: This study aimed to compare cognitive, non-cognitive, and overall learning outcomes for sepsis and trauma resuscitation skills in novices with virtual patient simulation (VPS) versus in-person simulation (IPS).
Methods: A randomized controlled trial was conducted on junior doctors in emergency departments from January to December 2022, comparing 70 minutes of VPS (n=19) versus IPS (n=21) in sepsis and trauma resuscitation. Using the nominal group technique, we created skills assessment checklists and determined Bloom's taxonomy domains for each checklist item. Two blinded raters observed participants leading 1 sepsis and 1 trauma resuscitation simulation. Satisfaction was measured using the Student Satisfaction with Learning Scale (SSLS). The SSLS and checklist scores were analyzed using the 2-tailed t-test.
Results: For sepsis, there was no significant difference between VPS and IPS in overall scores (2.0; 95% confidence interval [CI], -1.4 to 5.4; Cohen's d=0.38), as well as in items that were cognitive (1.1; 95% CI, -1.5 to 3.7) and not only cognitive (0.9; 95% CI, -0.4 to 2.2). Likewise, for trauma, there was no significant difference in overall scores (-0.9; 95% CI, -4.1 to 2.3; Cohen's d=0.19), as well as in items that were cognitive (-0.3; 95% CI, -2.8 to 2.1) and not only cognitive (-0.6; 95% CI, -2.4 to 1.3). The median SSLS scores were lower with VPS than with IPS (-3.0; 95% CI, -1.0 to -5.0).
Conclusion: For novices, there were no major differences in overall and non-cognitive learning outcomes for sepsis and trauma resuscitation between VPS and IPS. Learners were more satisfied with IPS than with VPS (clinicaltrials.gov identifier: NCT05201950).
{"title":"Comparison of virtual and in-person simulations for sepsis and trauma resuscitation training in Singapore: a randomized controlled trial.","authors":"Matthew Jian Wen Low, Gene Wai Han Chan, Zisheng Li, Yiwen Koh, Chi Loong Jen, Zi Yao Lee, Lenard Tai Win Cheng","doi":"10.3352/jeehp.2024.21.33","DOIUrl":"10.3352/jeehp.2024.21.33","url":null,"abstract":"<p><strong>Purpose: </strong>This study aimed to compare cognitive, non-cognitive, and overall learning outcomes for sepsis and trauma resuscitation skills in novices with virtual patient simulation (VPS) versus in-person simulation (IPS).</p><p><strong>Methods: </strong>A randomized controlled trial was conducted on junior doctors in emergency departments from January to December 2022, comparing 70 minutes of VPS (n=19) versus IPS (n=21) in sepsis and trauma resuscitation. Using the nominal group technique, we created skills assessment checklists and determined Bloom's taxonomy domains for each checklist item. Two blinded raters observed participants leading 1 sepsis and 1 trauma resuscitation simulation. Satisfaction was measured using the Student Satisfaction with Learning Scale (SSLS). The SSLS and checklist scores were analyzed using the 2-tailed t-test.</p><p><strong>Results: </strong>For sepsis, there was no significant difference between VPS and IPS in overall scores (2.0; 95% confidence interval [CI], -1.4 to 5.4; Cohen's d=0.38), as well as in items that were cognitive (1.1; 95% CI, -1.5 to 3.7) and not only cognitive (0.9; 95% CI, -0.4 to 2.2). Likewise, for trauma, there was no significant difference in overall scores (-0.9; 95% CI, -4.1 to 2.3; Cohen's d=0.19), as well as in items that were cognitive (-0.3; 95% CI, -2.8 to 2.1) and not only cognitive (-0.6; 95% CI, -2.4 to 1.3). The median SSLS scores were lower with VPS than with IPS (-3.0; 95% CI, -1.0 to -5.0).</p><p><strong>Conclusion: </strong>For novices, there were no major differences in overall and non-cognitive learning outcomes for sepsis and trauma resuscitation between VPS and IPS. Learners were more satisfied with IPS than with VPS (clinicaltrials.gov identifier: NCT05201950).</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"21 ","pages":"33"},"PeriodicalIF":9.3,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142648693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01Epub Date: 2024-11-11DOI: 10.3352/jeehp.2024.21.32
Marcos Carvalho Borges, Luciane Loures Santos, Paulo Henrique Manso, Elaine Christine Dantas Moisés, Pedro Soler Coltro, Priscilla Costa Fonseca, Paulo Roberto Alves Gentil, Rodrigo de Carvalho Santana, Lucas Faria Rodrigues, Benedito Carlos Maciel, Hilton Marcos Alves Ricz
Purpose: With the COVID-19 pandemic, online high-stakes exams have become a viable alternative. This study evaluated the feasibility of computer-based testing (CBT) for medical residency applications in Brazil and its impacts on item quality and applicants' access compared to paper-based testing.
Methods: In 2020, an online CBT was conducted in a Ribeirao Preto Clinical Hospital in Brazil. In total, 120 multiple-choice question items were constructed. Two years later, the exam was performed as paper-based testing. Item construction processes were similar for both exams. Difficulty and discrimination indexes, point-biserial coefficient, difficulty, discrimination, guessing parameters, and Cronbach's alpha coefficient were measured based on the item response and classical test theories. Internet stability for applicants was monitored.
Results: In 2020, 4,846 individuals (57.1% female, mean age of 26.64 ± 3.37 years) applied to the residency program, versus 2,196 individuals (55.2% female, mean age of 26.47 ± 3.20 years) in 2022. For CBT, there was an increase of 2,650 (120.7%) applicants, albeit with significant differences in demographic characteristics. There was a significant increase in applicants from more distant and lower-income Brazilian regions, such as the North (5.6% vs. 2.7%) and Northeast (16.9% vs. 9.0%). No significant differences were found in difficulty and discrimination indexes, point-biserial coefficients, and Cronbach's alpha coefficients between the 2 exams.
Conclusion: Online CBT with multiple-choice questions was a viable format for a residency application exam, improving accessibility without compromising exam integrity and quality.
{"title":"Increased accessibility of computer-based testing for residency application to a hospital in Brazil with item characteristics comparable to paper-based testing: a psychometric study.","authors":"Marcos Carvalho Borges, Luciane Loures Santos, Paulo Henrique Manso, Elaine Christine Dantas Moisés, Pedro Soler Coltro, Priscilla Costa Fonseca, Paulo Roberto Alves Gentil, Rodrigo de Carvalho Santana, Lucas Faria Rodrigues, Benedito Carlos Maciel, Hilton Marcos Alves Ricz","doi":"10.3352/jeehp.2024.21.32","DOIUrl":"https://doi.org/10.3352/jeehp.2024.21.32","url":null,"abstract":"<p><strong>Purpose: </strong>With the COVID-19 pandemic, online high-stakes exams have become a viable alternative. This study evaluated the feasibility of computer-based testing (CBT) for medical residency applications in Brazil and its impacts on item quality and applicants' access compared to paper-based testing.</p><p><strong>Methods: </strong>In 2020, an online CBT was conducted in a Ribeirao Preto Clinical Hospital in Brazil. In total, 120 multiple-choice question items were constructed. Two years later, the exam was performed as paper-based testing. Item construction processes were similar for both exams. Difficulty and discrimination indexes, point-biserial coefficient, difficulty, discrimination, guessing parameters, and Cronbach's alpha coefficient were measured based on the item response and classical test theories. Internet stability for applicants was monitored.</p><p><strong>Results: </strong>In 2020, 4,846 individuals (57.1% female, mean age of 26.64 ± 3.37 years) applied to the residency program, versus 2,196 individuals (55.2% female, mean age of 26.47 ± 3.20 years) in 2022. For CBT, there was an increase of 2,650 (120.7%) applicants, albeit with significant differences in demographic characteristics. There was a significant increase in applicants from more distant and lower-income Brazilian regions, such as the North (5.6% vs. 2.7%) and Northeast (16.9% vs. 9.0%). No significant differences were found in difficulty and discrimination indexes, point-biserial coefficients, and Cronbach's alpha coefficients between the 2 exams.</p><p><strong>Conclusion: </strong>Online CBT with multiple-choice questions was a viable format for a residency application exam, improving accessibility without compromising exam integrity and quality.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"21 ","pages":"32"},"PeriodicalIF":9.3,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142630096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01Epub Date: 2024-02-28DOI: 10.3352/jeehp.2024.21.4
Hiroyasu Sato, Katsuhiko Ogasawara
Purpose: The objective of this study was to assess the performance of ChatGPT (GPT-4) on all items, including those with diagrams, in the Japanese National License Examination for Pharmacists (JNLEP) and compare it with the previous GPT-3.5 model’s performance.
Methods: The 107th JNLEP, conducted in 2022, with 344 items input into the GPT-4 model, was targeted for this study. Separately, 284 items, excluding those with diagrams, were entered into the GPT-3.5 model. The answers were categorized and analyzed to determine accuracy rates based on categories, subjects, and presence or absence of diagrams. The accuracy rates were compared to the main passing criteria (overall accuracy rate ≥62.9%).
Results: The overall accuracy rate for all items in the 107th JNLEP in GPT-4 was 72.5%, successfully meeting all the passing criteria. For the set of items without diagrams, the accuracy rate was 80.0%, which was significantly higher than that of the GPT-3.5 model (43.5%). The GPT-4 model demonstrated an accuracy rate of 36.1% for items that included diagrams.
Conclusion: Advancements that allow GPT-4 to process images have made it possible for LLMs to answer all items in medical-related license examinations. This study’s findings confirm that ChatGPT (GPT-4) possesses sufficient knowledge to meet the passing criteria.
{"title":"ChatGPT (GPT-4) passed the Japanese National License Examination for Pharmacists in 2022, answering all items including those with diagrams: a descriptive study.","authors":"Hiroyasu Sato, Katsuhiko Ogasawara","doi":"10.3352/jeehp.2024.21.4","DOIUrl":"10.3352/jeehp.2024.21.4","url":null,"abstract":"<p><strong>Purpose: </strong>The objective of this study was to assess the performance of ChatGPT (GPT-4) on all items, including those with diagrams, in the Japanese National License Examination for Pharmacists (JNLEP) and compare it with the previous GPT-3.5 model’s performance.</p><p><strong>Methods: </strong>The 107th JNLEP, conducted in 2022, with 344 items input into the GPT-4 model, was targeted for this study. Separately, 284 items, excluding those with diagrams, were entered into the GPT-3.5 model. The answers were categorized and analyzed to determine accuracy rates based on categories, subjects, and presence or absence of diagrams. The accuracy rates were compared to the main passing criteria (overall accuracy rate ≥62.9%).</p><p><strong>Results: </strong>The overall accuracy rate for all items in the 107th JNLEP in GPT-4 was 72.5%, successfully meeting all the passing criteria. For the set of items without diagrams, the accuracy rate was 80.0%, which was significantly higher than that of the GPT-3.5 model (43.5%). The GPT-4 model demonstrated an accuracy rate of 36.1% for items that included diagrams.</p><p><strong>Conclusion: </strong>Advancements that allow GPT-4 to process images have made it possible for LLMs to answer all items in medical-related license examinations. This study’s findings confirm that ChatGPT (GPT-4) possesses sufficient knowledge to meet the passing criteria.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"21 ","pages":"4"},"PeriodicalIF":4.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10948916/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139984149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}