Pub Date : 2024-01-01Epub Date: 2024-09-24DOI: 10.3352/jeehp.2024.21.26
Ting Sun, Stella Yun Kim, Brigitte Kristin Smith, Yoon Soo Park
Purpose: The System for Improving and Measuring Procedure Learning (SIMPL), a smartphone-based operative assessment application, was developed to assess the intraoperative performance of surgical residents. This study aims to examine the reliability of the SIMPL assessment and determine the optimal number of procedures for a reliable assessment.
Methods: In this retrospective observational study, we analyzed data collected between 2015 and 2023 from 4,616 residents across 94 General Surgery Residency programs in the United States that utilized the SIMPL smartphone application. We employed multivariate generalizability theory and initially conducted generalizability studies to estimate the variance components associated with procedures. We then performed decision studies to estimate the reliability coefficient and the minimum number of procedures required for a reproducible assessment.
Results: We estimated that the reliability of the assessment of surgical trainees’ intraoperative autonomy and performance using SIMPL exceeded 0.70. Additionally, the optimal number of procedures required for a reproducible assessment was 10, 17, 15, and 17 for postgraduate year (PGY) 2, PGY 3, PGY 4, and PGY 5, respectively. Notably, the study highlighted that the assessment of residents in their senior years necessitated a larger number of procedures compared to those in their junior years.
Conclusion: The study demonstrated that the SIMPL assessment is reliably effective for evaluating the intraoperative performance of surgical trainees. Adjusting the number of procedures based on the trainees’ training stage enhances the assessment process’s accuracy and effectiveness.
{"title":"Reliability of a workplace-based assessment for the United States general surgical trainees’ intraoperative performance using multivariate generalizability theory: a psychometric study","authors":"Ting Sun, Stella Yun Kim, Brigitte Kristin Smith, Yoon Soo Park","doi":"10.3352/jeehp.2024.21.26","DOIUrl":"10.3352/jeehp.2024.21.26","url":null,"abstract":"<p><strong>Purpose: </strong>The System for Improving and Measuring Procedure Learning (SIMPL), a smartphone-based operative assessment application, was developed to assess the intraoperative performance of surgical residents. This study aims to examine the reliability of the SIMPL assessment and determine the optimal number of procedures for a reliable assessment.</p><p><strong>Methods: </strong>In this retrospective observational study, we analyzed data collected between 2015 and 2023 from 4,616 residents across 94 General Surgery Residency programs in the United States that utilized the SIMPL smartphone application. We employed multivariate generalizability theory and initially conducted generalizability studies to estimate the variance components associated with procedures. We then performed decision studies to estimate the reliability coefficient and the minimum number of procedures required for a reproducible assessment.</p><p><strong>Results: </strong>We estimated that the reliability of the assessment of surgical trainees’ intraoperative autonomy and performance using SIMPL exceeded 0.70. Additionally, the optimal number of procedures required for a reproducible assessment was 10, 17, 15, and 17 for postgraduate year (PGY) 2, PGY 3, PGY 4, and PGY 5, respectively. Notably, the study highlighted that the assessment of residents in their senior years necessitated a larger number of procedures compared to those in their junior years.</p><p><strong>Conclusion: </strong>The study demonstrated that the SIMPL assessment is reliably effective for evaluating the intraoperative performance of surgical trainees. Adjusting the number of procedures based on the trainees’ training stage enhances the assessment process’s accuracy and effectiveness.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"21 ","pages":"26"},"PeriodicalIF":9.3,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142356104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01Epub Date: 2024-08-20DOI: 10.3352/jeehp.2024.21.21
Sebastian Ebel, Constantin Ehrengut, Timm Denecke, Holger Gößmann, Anne Bettina Beeskow
Purpose: This study aimed to determine whether ChatGPT-4o, a generative artificial intelligence (AI) platform, was able to pass a simulated written European Board of Interventional Radiology (EBIR) exam and whether GPT-4o can be used to train medical students and interventional radiologists of different levels of expertise by generating exam items on interventional radiology.
Methods: GPT-4o was asked to answer 370 simulated exam items of the Cardiovascular and Interventional Radiology Society of Europe (CIRSE) for EBIR preparation (CIRSE Prep). Subsequently, GPT-4o was requested to generate exam items on interventional radiology topics at levels of difficulty suitable for medical students and the EBIR exam. Those generated items were answered by 4 participants, including a medical student, a resident, a consultant, and an EBIR holder. The correctly answered items were counted. One investigator checked the answers and items generated by GPT-4o for correctness and relevance. This work was done from April to July 2024.
Results: GPT-4o correctly answered 248 of the 370 CIRSE Prep items (67.0%). For 50 CIRSE Prep items, the medical student answered 46.0%, the resident 42.0%, the consultant 50.0%, and the EBIR holder 74.0% correctly. All participants answered 82.0% to 92.0% of the 50 GPT-4o generated items at the student level correctly. For the 50 GPT-4o items at the EBIR level, the medical student answered 32.0%, the resident 44.0%, the consultant 48.0%, and the EBIR holder 66.0% correctly. All participants could pass the GPT-4o-generated items for the student level; while the EBIR holder could pass the GPT-4o-generated items for the EBIR level. Two items (0.3%) out of 150 generated by the GPT-4o were assessed as implausible.
Conclusion: GPT-4o could pass the simulated written EBIR exam and create exam items of varying difficulty to train medical students and interventional radiologists.
{"title":"GPT-4o’s competency in answering the simulated written European Board of Interventional Radiology exam compared to a medical student and experts in Germany and its ability to generate exam items on interventional radiology: a descriptive study.","authors":"Sebastian Ebel, Constantin Ehrengut, Timm Denecke, Holger Gößmann, Anne Bettina Beeskow","doi":"10.3352/jeehp.2024.21.21","DOIUrl":"10.3352/jeehp.2024.21.21","url":null,"abstract":"<p><strong>Purpose: </strong>This study aimed to determine whether ChatGPT-4o, a generative artificial intelligence (AI) platform, was able to pass a simulated written European Board of Interventional Radiology (EBIR) exam and whether GPT-4o can be used to train medical students and interventional radiologists of different levels of expertise by generating exam items on interventional radiology.</p><p><strong>Methods: </strong>GPT-4o was asked to answer 370 simulated exam items of the Cardiovascular and Interventional Radiology Society of Europe (CIRSE) for EBIR preparation (CIRSE Prep). Subsequently, GPT-4o was requested to generate exam items on interventional radiology topics at levels of difficulty suitable for medical students and the EBIR exam. Those generated items were answered by 4 participants, including a medical student, a resident, a consultant, and an EBIR holder. The correctly answered items were counted. One investigator checked the answers and items generated by GPT-4o for correctness and relevance. This work was done from April to July 2024.</p><p><strong>Results: </strong>GPT-4o correctly answered 248 of the 370 CIRSE Prep items (67.0%). For 50 CIRSE Prep items, the medical student answered 46.0%, the resident 42.0%, the consultant 50.0%, and the EBIR holder 74.0% correctly. All participants answered 82.0% to 92.0% of the 50 GPT-4o generated items at the student level correctly. For the 50 GPT-4o items at the EBIR level, the medical student answered 32.0%, the resident 44.0%, the consultant 48.0%, and the EBIR holder 66.0% correctly. All participants could pass the GPT-4o-generated items for the student level; while the EBIR holder could pass the GPT-4o-generated items for the EBIR level. Two items (0.3%) out of 150 generated by the GPT-4o were assessed as implausible.</p><p><strong>Conclusion: </strong>GPT-4o could pass the simulated written EBIR exam and create exam items of varying difficulty to train medical students and interventional radiologists.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"21 ","pages":"21"},"PeriodicalIF":9.3,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142005513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-10DOI: 10.3352/jeehp.2023.20.29
Janghee Park
Purpose: This study aimed to analyze patterns of using ChatGPT before and after group activities and to explore medical students’ perceptions of ChatGPT as a feedback tool in the classroom.Methods: The study included 99 2nd-year pre-medical students who participated in a “Leadership and Communication” course from March to June 2023. Students engaged in both individual and group activities related to negotiation strategies. ChatGPT was used to provide feedback on their solutions. A survey was administered to assess students’ perceptions of ChatGPT’s feedback, its use in the classroom, and the strengths and challenges of ChatGPT from May 17 to 19, 2023.Results: The students responded by indicating that ChatGPT’s feedback was helpful, and revised and resubmitted their group answers in various ways after receiving feedback. The majority of respondents expressed agreement with the use of ChatGPT during class. The most common response concerning the appropriate context of using ChatGPT’s feedback was “after the first round of discussion, for revisions.” There was a significant difference in satisfaction with ChatGPT’s feedback, including correctness, usefulness, and ethics, depending on whether or not ChatGPT was used during class, but there was no significant difference according to gender or whether students had previous experience with ChatGPT. The strongest advantages were “providing answers to questions” and “summarizing information,” and the worst disadvantage was “producing information without supporting evidence.”Conclusion: The students were aware of the advantages and disadvantages of ChatGPT, and they had a positive attitude toward using ChatGPT in the classroom.
{"title":"Medical students’ patterns of using ChatGPT as a feedback tool and perceptions of ChatGPT in a Leadership and Communication course in Korea: a cross-sectional study","authors":"Janghee Park","doi":"10.3352/jeehp.2023.20.29","DOIUrl":"https://doi.org/10.3352/jeehp.2023.20.29","url":null,"abstract":"Purpose: This study aimed to analyze patterns of using ChatGPT before and after group activities and to explore medical students’ perceptions of ChatGPT as a feedback tool in the classroom.Methods: The study included 99 2nd-year pre-medical students who participated in a “Leadership and Communication” course from March to June 2023. Students engaged in both individual and group activities related to negotiation strategies. ChatGPT was used to provide feedback on their solutions. A survey was administered to assess students’ perceptions of ChatGPT’s feedback, its use in the classroom, and the strengths and challenges of ChatGPT from May 17 to 19, 2023.Results: The students responded by indicating that ChatGPT’s feedback was helpful, and revised and resubmitted their group answers in various ways after receiving feedback. The majority of respondents expressed agreement with the use of ChatGPT during class. The most common response concerning the appropriate context of using ChatGPT’s feedback was “after the first round of discussion, for revisions.” There was a significant difference in satisfaction with ChatGPT’s feedback, including correctness, usefulness, and ethics, depending on whether or not ChatGPT was used during class, but there was no significant difference according to gender or whether students had previous experience with ChatGPT. The strongest advantages were “providing answers to questions” and “summarizing information,” and the worst disadvantage was “producing information without supporting evidence.”Conclusion: The students were aware of the advantages and disadvantages of ChatGPT, and they had a positive attitude toward using ChatGPT in the classroom.","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"99 27","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135092034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
At the end of 2022, the appearance of ChatGPT, an artificial intelligence (AI) chatbot with amazing writing ability, caused a great sensation in academia. The chatbot turned out to be very capable, but also capable of deception, and the news broke that several researchers had listed the chatbot (including its earlier version) as co-authors of their academic papers. In response, Nature and Science expressed their position that this chatbot cannot be listed as an author in the papers they publish. Since an AI chatbot is not a human being, in the current legal system, the text automatically generated by an AI chatbot cannot be a copyrighted work; thus, an AI chatbot cannot be an author of a copyrighted work. Current AI chatbots such as ChatGPT are much more advanced than search engines in that they produce original text, but they still remain at the level of a search engine in that they cannot take responsibility for their writing. For this reason, they also cannot be authors from the perspective of research ethics.
{"title":"Can an artificial intelligence chatbot be the author of a scholarly article?","authors":"Ju Yoen Lee","doi":"10.3352/jeehp.2022.20.6","DOIUrl":"https://doi.org/10.3352/jeehp.2022.20.6","url":null,"abstract":"At the end of 2022, the appearance of ChatGPT, an artificial intelligence (AI) chatbot with amazing writing ability, caused a great sensation in academia. The chatbot turned out to be very capable, but also capable of deception, and the news broke that several researchers had listed the chatbot (including its earlier version) as co-authors of their academic papers. In response, Nature and Science expressed their position that this chatbot cannot be listed as an author in the papers they publish. Since an AI chatbot is not a human being, in the current legal system, the text automatically generated by an AI chatbot cannot be a copyrighted work; thus, an AI chatbot cannot be an author of a copyrighted work. Current AI chatbots such as ChatGPT are much more advanced than search engines in that they produce original text, but they still remain at the level of a search engine in that they cannot take responsibility for their writing. For this reason, they also cannot be authors from the perspective of research ethics.","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135892320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-11DOI: 10.3352/jeehp.2023.20.01
Sun Huh
This study aimed to compare the knowledge and interpretation ability of ChatGPT, a language model of artificial general intelligence, with those of medical students in Korea by administering a parasitology examination to both ChatGPT and medical students. The examination consisted of 79 items and was administered to ChatGPT on January 1, 2023. The examination results were analyzed in terms of ChatGPT’s overall performance score, its correct answer rate by the items’ knowledge level, and the acceptability of its explanations of the items. ChatGPT’s performance was lower than that of the medical students, and ChatGPT’s correct answer rate was not related to the items’ knowledge level. However, there was a relationship between acceptable explanations and correct answers. In conclusion, ChatGPT’s knowledge and interpretation ability for this parasitology examination were not yet comparable to those of medical students in Korea.
{"title":"Are ChatGPT’s knowledge and interpretation ability comparable to those of medical students in Korea for taking a parasitology examination?: a descriptive study","authors":"Sun Huh","doi":"10.3352/jeehp.2023.20.01","DOIUrl":"https://doi.org/10.3352/jeehp.2023.20.01","url":null,"abstract":"This study aimed to compare the knowledge and interpretation ability of ChatGPT, a language model of artificial general intelligence, with those of medical students in Korea by administering a parasitology examination to both ChatGPT and medical students. The examination consisted of 79 items and was administered to ChatGPT on January 1, 2023. The examination results were analyzed in terms of ChatGPT’s overall performance score, its correct answer rate by the items’ knowledge level, and the acceptability of its explanations of the items. ChatGPT’s performance was lower than that of the medical students, and ChatGPT’s correct answer rate was not related to the items’ knowledge level. However, there was a relationship between acceptable explanations and correct answers. In conclusion, ChatGPT’s knowledge and interpretation ability for this parasitology examination were not yet comparable to those of medical students in Korea.","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"20 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2023-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45226107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.3352/jeehp.2023.20.19
Ali Khalafi, Maedeh Kordnejad, Vahid Saidkhani
Purpose: This study investigated the effect of evaluations based on the Anesthetic List Management Assessment Tool (ALMAT) form on improving the technical and non-technical skills of final-year nurse anesthesia students at Ahvaz Jundishapur University of Medical Sciences (AJUMS).
Methods: This was a semi-experimental study with a pre-test and post-test design. It included 45 final-year nurse anesthesia students of AJUMS and lasted for 3 months. The technical and non-technical skills of the intervention group were assessed at 4 university hospitals using formative-feedback evaluation based on the ALMAT form, from induction of anesthesia until reaching mastery and independence. Finally, the students’ degree of improvement in technical and non-technical skills was compared between the intervention and control groups. Statistical tests (the independent t-test, paired t-test, and Mann-Whitney test) were used to analyze the data.
Results: The rate of improvement in post-test scores of technical skills was significantly higher in the intervention group than in the control group (P<0.0001). Similarly, the students in the intervention group received significantly higher post-test scores for non-technical skills than the students in the control group (P<0.0001).
Conclusion: The findings of this study showed that the use of ALMAT as a formative-feedback evaluation method to evaluate technical and non-technical skills had a significant effect on improving these skills and was effective in helping students learn and reach mastery and independence.
{"title":"Enhancement of the technical and non-technical skills of nurse anesthesia students using the Anesthetic List Management Assessment Tool in Iran: a quasi-experimental study.","authors":"Ali Khalafi, Maedeh Kordnejad, Vahid Saidkhani","doi":"10.3352/jeehp.2023.20.19","DOIUrl":"https://doi.org/10.3352/jeehp.2023.20.19","url":null,"abstract":"<p><strong>Purpose: </strong>This study investigated the effect of evaluations based on the Anesthetic List Management Assessment Tool (ALMAT) form on improving the technical and non-technical skills of final-year nurse anesthesia students at Ahvaz Jundishapur University of Medical Sciences (AJUMS).</p><p><strong>Methods: </strong>This was a semi-experimental study with a pre-test and post-test design. It included 45 final-year nurse anesthesia students of AJUMS and lasted for 3 months. The technical and non-technical skills of the intervention group were assessed at 4 university hospitals using formative-feedback evaluation based on\u0000the ALMAT form, from induction of anesthesia until reaching mastery and independence. Finally, the students’ degree of improvement in technical and non-technical skills was compared between the intervention and control groups. Statistical tests (the independent t-test, paired t-test, and Mann-Whitney test) were used to analyze the data.</p><p><strong>Results: </strong>The rate of improvement in post-test scores of technical skills was significantly higher in the intervention group than in the control group (P<0.0001). Similarly, the students in the intervention group received significantly higher post-test scores for non-technical skills than the students in the control group (P<0.0001).</p><p><strong>Conclusion: </strong>The findings of this study showed that the use of ALMAT as a formative-feedback evaluation method to evaluate technical and non-technical skills had a significant effect on improving these skills and was effective in helping students learn and reach mastery and independence.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"20 ","pages":"19"},"PeriodicalIF":4.4,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10352009/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9833205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pin-Hsiang Huang, Gary Velan, Greg Smith, Melanie Fentoullis, Sean Edward Kennedy, Karen Jane Gibson, Kerry Uebel, Boaz Shulruf
Purpose: This study evaluated the validity of student feedback derived from Medicine Student Experience Questionnaire (MedSEQ), as well as the predictors of students' satisfaction in the Medicine program.
Methods: Data from MedSEQ applying to the University of New South Wales Medicine program in 2017, 2019, and 2021 were analyzed. Confirmatory factor analysis (CFA) and Cronbach's α were used to assess the construct validity and reliability of MedSEQ respectively. Hierarchical multiple linear regressions were used to identify the factors that most impact students' overall satisfaction with the program.
Results: A total of 1,719 students (34.50%) responded to MedSEQ. CFA showed good fit indices (root mean square error of approximation=0.051; comparative fit index=0.939; chi-square/degrees of freedom=6.429). All factors yielded good (α>0.7) or very good (α>0.8) levels of reliability, except the "online resources" factor, which had acceptable reliability (α=0.687). A multiple linear regression model with only demographic characteristics explained 3.8% of the variance in students' overall satisfaction, whereas the model adding 8 domains from MedSEQ explained 40%, indicating that 36.2% of the variance was attributable to students' experience across the 8 domains. Three domains had the strongest impact on overall satisfaction: "being cared for," "satisfaction with teaching," and "satisfaction with assessment" (β=0.327, 0.148, 0.148, respectively; all with P<0.001).
Conclusion: MedSEQ has good construct validity and high reliability, reflecting students' satisfaction with the Medicine program. Key factors impacting students' satisfaction are the perception of being cared for, quality teaching irrespective of the mode of delivery and fair assessment tasks which enhance learning.
{"title":"What impacts students' satisfaction the most from Medicine Student Experience Questionnaire in Australia: a validity study.","authors":"Pin-Hsiang Huang, Gary Velan, Greg Smith, Melanie Fentoullis, Sean Edward Kennedy, Karen Jane Gibson, Kerry Uebel, Boaz Shulruf","doi":"10.3352/jeehp.2023.20.2","DOIUrl":"https://doi.org/10.3352/jeehp.2023.20.2","url":null,"abstract":"<p><strong>Purpose: </strong>This study evaluated the validity of student feedback derived from Medicine Student Experience Questionnaire (MedSEQ), as well as the predictors of students' satisfaction in the Medicine program.</p><p><strong>Methods: </strong>Data from MedSEQ applying to the University of New South Wales Medicine program in 2017, 2019, and 2021 were analyzed. Confirmatory factor analysis (CFA) and Cronbach's α were used to assess the construct validity and reliability of MedSEQ respectively. Hierarchical multiple linear regressions were used to identify the factors that most impact students' overall satisfaction with the program.</p><p><strong>Results: </strong>A total of 1,719 students (34.50%) responded to MedSEQ. CFA showed good fit indices (root mean square error of approximation=0.051; comparative fit index=0.939; chi-square/degrees of freedom=6.429). All factors yielded good (α>0.7) or very good (α>0.8) levels of reliability, except the \"online resources\" factor, which had acceptable reliability (α=0.687). A multiple linear regression model with only demographic characteristics explained 3.8% of the variance in students' overall satisfaction, whereas the model adding 8 domains from MedSEQ explained 40%, indicating that 36.2% of the variance was attributable to students' experience across the 8 domains. Three domains had the strongest impact on overall satisfaction: \"being cared for,\" \"satisfaction with teaching,\" and \"satisfaction with assessment\" (β=0.327, 0.148, 0.148, respectively; all with P<0.001).</p><p><strong>Conclusion: </strong>MedSEQ has good construct validity and high reliability, reflecting students' satisfaction with the Medicine program. Key factors impacting students' satisfaction are the perception of being cared for, quality teaching irrespective of the mode of delivery and fair assessment tasks which enhance learning.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"20 ","pages":"2"},"PeriodicalIF":4.4,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9986309/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10866573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01Epub Date: 2023-10-16DOI: 10.3352/jeehp.2023.20.28
Aleksandra Ignjatović, Lazar Stevanović
Purpose This study aimed to assess the performance of ChatGPT (GPT-3.5 and GPT-4) as a study tool in solving biostatistical problems and to identify any potential drawbacks that might arise from using ChatGPT in medical education, particularly in solving practical biostatistical problems. Methods ChatGPT was tested to evaluate its ability to solve biostatistical problems from the Handbook of Medical Statistics by Peacock and Peacock in this descriptive study. Tables from the problems were transformed into textual questions. Ten biostatistical problems were randomly chosen and used as text-based input for conversation with ChatGPT (versions 3.5 and 4). Results GPT-3.5 solved 5 practical problems in the first attempt, related to categorical data, cross-sectional study, measuring reliability, probability properties, and the t-test. GPT-3.5 failed to provide correct answers regarding analysis of variance, the chi-square test, and sample size within 3 attempts. GPT-4 also solved a task related to the confidence interval in the first attempt and solved all questions within 3 attempts, with precise guidance and monitoring. Conclusion The assessment of both versions of ChatGPT performance in 10 biostatistical problems revealed that GPT-3.5 and 4’s performance was below average, with correct response rates of 5 and 6 out of 10 on the first attempt. GPT-4 succeeded in providing all correct answers within 3 attempts. These findings indicate that students must be aware that this tool, even when providing and calculating different statistical analyses, can be wrong, and they should be aware of ChatGPT’s limitations and be careful when incorporating this model into medical education.
{"title":"Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study","authors":"Aleksandra Ignjatović, Lazar Stevanović","doi":"10.3352/jeehp.2023.20.28","DOIUrl":"10.3352/jeehp.2023.20.28","url":null,"abstract":"Purpose This study aimed to assess the performance of ChatGPT (GPT-3.5 and GPT-4) as a study tool in solving biostatistical problems and to identify any potential drawbacks that might arise from using ChatGPT in medical education, particularly in solving practical biostatistical problems. Methods ChatGPT was tested to evaluate its ability to solve biostatistical problems from the Handbook of Medical Statistics by Peacock and Peacock in this descriptive study. Tables from the problems were transformed into textual questions. Ten biostatistical problems were randomly chosen and used as text-based input for conversation with ChatGPT (versions 3.5 and 4). Results GPT-3.5 solved 5 practical problems in the first attempt, related to categorical data, cross-sectional study, measuring reliability, probability properties, and the t-test. GPT-3.5 failed to provide correct answers regarding analysis of variance, the chi-square test, and sample size within 3 attempts. GPT-4 also solved a task related to the confidence interval in the first attempt and solved all questions within 3 attempts, with precise guidance and monitoring. Conclusion The assessment of both versions of ChatGPT performance in 10 biostatistical problems revealed that GPT-3.5 and 4’s performance was below average, with correct response rates of 5 and 6 out of 10 on the first attempt. GPT-4 succeeded in providing all correct answers within 3 attempts. These findings indicate that students must be aware that this tool, even when providing and calculating different statistical analyses, can be wrong, and they should be aware of ChatGPT’s limitations and be careful when incorporating this model into medical education.","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"20 ","pages":"28"},"PeriodicalIF":4.4,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10646144/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41239759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.3352/jeehp.2023.20.15
Yun Mi Kim, Sun Hee Lee, Sun Ok Lee, Mi Young An, Bu Youn Kim, Jum Mi Park
Purpose: The number of Korean midwifery licensing examination applicants has steadily decreased due to the low birth rate and lack of training institutions for midwives. This study aimed to evaluate the adequacy of the examination-based licensing system and the possibility of a training-based licensing system.
Methods: A survey questionnaire was developed and dispatched to 230 professionals from December 28, 2022 to January 13, 2023, through an online form using Google Surveys. Descriptive statistics were used to analyze the results.
Results: Responses from 217 persons (94.3%) were analyzed after excluding incomplete responses. Out of the 217 participants, 198 (91.2%) agreed with maintaining the current examination-based licensing system; 94 (43.3%) agreed with implementing a training-based licensing system to cover the examination costs due to the decreasing number of applicants; 132 (60.8%) agreed with establishing a midwifery education evaluation center for a training-based licensing system; 163 (75.1%) said that the quality of midwifery might be lowered if midwives were produced only by a training-based licensing system, and 197 (90.8%) said that the training of midwives as birth support personnel should be promoted in Korea.
Conclusion: Favorable results were reported for the examination-based licensing system; however, if a training-based licensing system is implemented, it will be necessary to establish a midwifery education evaluation center to manage the quality of midwives. As the annual number of candidates for the Korean midwifery licensing examination has been approximately 10 in recent years, it is necessary to consider more actively granting midwifery licenses through a training-based licensing system.
{"title":"Adequacy of the examination-based licensing system and a training-based licensing system for midwifery license according to changes in childbirth medical infrastructure in Korea: a surveybased descriptive study","authors":"Yun Mi Kim, Sun Hee Lee, Sun Ok Lee, Mi Young An, Bu Youn Kim, Jum Mi Park","doi":"10.3352/jeehp.2023.20.15","DOIUrl":"https://doi.org/10.3352/jeehp.2023.20.15","url":null,"abstract":"<p><strong>Purpose: </strong>The number of Korean midwifery licensing examination applicants has steadily decreased due to the low birth rate and lack of training institutions for midwives. This study aimed to evaluate the adequacy of the examination-based licensing system and the possibility of a training-based licensing system.</p><p><strong>Methods: </strong>A survey questionnaire was developed and dispatched to 230 professionals from December 28, 2022 to January 13, 2023, through an online form using Google Surveys. Descriptive statistics were used to analyze the results.</p><p><strong>Results: </strong>Responses from 217 persons (94.3%) were analyzed after excluding incomplete responses. Out of the 217 participants, 198 (91.2%) agreed with maintaining the current examination-based licensing system; 94 (43.3%) agreed with implementing a training-based licensing system to cover the examination costs due to the decreasing number of applicants; 132 (60.8%) agreed with establishing a midwifery education evaluation center for a training-based licensing system; 163 (75.1%) said that the quality of midwifery might be lowered if midwives were produced only by a training-based licensing system, and 197 (90.8%) said that the training of midwives as birth support personnel should be promoted in Korea.</p><p><strong>Conclusion: </strong>Favorable results were reported for the examination-based licensing system; however, if a training-based licensing system is implemented, it will be necessary to establish a midwifery education evaluation center to manage the quality of midwives. As the annual number of candidates for the Korean midwifery licensing examination has been approximately 10 in recent years, it is necessary to consider more actively granting midwifery licenses through a training-based licensing system.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"20 ","pages":"15"},"PeriodicalIF":4.4,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10325871/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9761228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01Epub Date: 2023-07-17DOI: 10.3352/jeehp.2023.20.23
Chan-Young Kwon, Sanghoon Lee, Min Hwangbo, Chungsik Cho, Sangwoo Shin, Dong-Hyeon Kim, Aram Jeong, Hye-Yoon Lee
Purpose This study investigated the validity of introducing a clinical skills examination (CSE) to the Korean Oriental Medicine Licensing Examination through a mixed-method modified Delphi study. Methods A 3-round Delphi study was conducted between September and November 2022. The expert panel comprised 21 oriental medicine education experts who were officially recommended by relevant institutions and organizations. The questionnaires included potential content for the CSE and a detailed implementation strategy. Subcommittees were formed to discuss concerns around the introduction of the CSE, which were collected as open-ended questions. In this study, a 66.7% or greater agreement rate was defined as achieving a consensus. Results The expert panel’s evaluation of the proposed clinical presentations and basic clinical skills suggested their priorities. Of the 10 items investigated for building a detailed implementation strategy for the introduction of the CSE to the Korean Oriental Medicine Licensing Examination, a consensus was achieved on 9. However, the agreement rate on the timing of the introduction of the CSE was low. Concerns around 4 clinical topics were discussed in the subcommittees, and potential solutions were proposed. Conclusion This study offers preliminary data and raises some concerns that can be used as a reference while discussing the introduction of the CSE to the Korean Oriental Medicine Licensing Examination.
{"title":"Implementation strategy for introducing a clinical skills examination to the Korean Oriental Medicine Licensing Examination: a mixed-method modified Delphi study","authors":"Chan-Young Kwon, Sanghoon Lee, Min Hwangbo, Chungsik Cho, Sangwoo Shin, Dong-Hyeon Kim, Aram Jeong, Hye-Yoon Lee","doi":"10.3352/jeehp.2023.20.23","DOIUrl":"10.3352/jeehp.2023.20.23","url":null,"abstract":"Purpose This study investigated the validity of introducing a clinical skills examination (CSE) to the Korean Oriental Medicine Licensing Examination through a mixed-method modified Delphi study. Methods A 3-round Delphi study was conducted between September and November 2022. The expert panel comprised 21 oriental medicine education experts who were officially recommended by relevant institutions and organizations. The questionnaires included potential content for the CSE and a detailed implementation strategy. Subcommittees were formed to discuss concerns around the introduction of the CSE, which were collected as open-ended questions. In this study, a 66.7% or greater agreement rate was defined as achieving a consensus. Results The expert panel’s evaluation of the proposed clinical presentations and basic clinical skills suggested their priorities. Of the 10 items investigated for building a detailed implementation strategy for the introduction of the CSE to the Korean Oriental Medicine Licensing Examination, a consensus was achieved on 9. However, the agreement rate on the timing of the introduction of the CSE was low. Concerns around 4 clinical topics were discussed in the subcommittees, and potential solutions were proposed. Conclusion This study offers preliminary data and raises some concerns that can be used as a reference while discussing the introduction of the CSE to the Korean Oriental Medicine Licensing Examination.","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"20 ","pages":"23"},"PeriodicalIF":4.4,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10432826/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10024447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}