首页 > 最新文献

Journal of Educational Evaluation for Health Professions最新文献

英文 中文
Inter-rater reliability and content validity of the measurement tool for portfolio assessments used in the Introduction to Clinical Medicine course at Ewha Womans University College of Medicine: a methodological study. 梨花女子大学医学院《临床医学导论》课程中组合评估测量工具的信度和内容效度:一项方法学研究。
IF 9.3 Q1 EDUCATION, SCIENTIFIC DISCIPLINES Pub Date : 2024-01-01 Epub Date: 2024-12-10 DOI: 10.3352/jeehp.2024.21.39
Dong-Mi Yoo, Jae Jin Han

Purpose: This study aimed to examine the reliability and validity of a measurement tool for portfolio assessments in medical education. Specifically, it investigated scoring consistency among raters and assessment criteria appropriateness according to an expert panel.

Methods: A cross-sectional observational study was conducted from September to December 2018 for the Introduction to Clinical Medicine course at the Ewha Womans University College of Medicine. Data were collected for 5 randomly selected portfolios scored by a gold-standard rater and 6 trained raters. An expert panel assessed the validity of 12 assessment items using the content validity index (CVI). Statistical analysis included Pearson correlation coefficients for rater alignment, the intraclass correlation coefficient (ICC) for inter-rater reliability, and the CVI for item-level validity.

Results: Rater 1 had the highest Pearson correlation (0.8916) with the gold-standard rater, while Rater 5 had the lowest (0.4203). The ICC for all raters was 0.3821, improving to 0.4415 after excluding Raters 1 and 5, indicating a 15.6% reliability increase. All assessment items met the CVI threshold of ≥0.75, with some achieving a perfect score (CVI=1.0). However, items like "sources" and "level and degree of performance" showed lower validity (CVI=0.72).

Conclusion: The present measurement tool for portfolio assessments demonstrated moderate reliability and strong validity, supporting its use as a credible tool. For a more reliable portfolio assessment, more faculty training is needed.

目的:本研究旨在检验医学教育档案评估之测量工具之信度与效度。具体而言,它调查了评分者之间的评分一致性和评估标准的适当性,根据专家小组。方法:2018年9月至12月,对梨花女子大学医学院临床医学导论课程进行了横断面观察性研究。数据收集随机选择的5个投资组合,由一个金标准评分者和6个训练有素的评分者打分。专家小组采用内容效度指数(CVI)对12个评估项目进行效度评估。统计分析包括评估者一致性的Pearson相关系数,评估者间信度的类内相关系数(ICC)和项目水平效度的CVI。结果:评分1与金标准评分的Pearson相关性最高(0.8916),评分5与金标准评分的Pearson相关性最低(0.4203)。所有评分者的ICC为0.3821,剔除评分者1和5后,ICC为0.4415,信度提高15.6%。所有评估项目均达到CVI阈值≥0.75,部分达到满分(CVI=1.0)。然而,“来源”和“表现水平和程度”等项目的效度较低(CVI=0.72)。结论:现有的投资组合评估工具具有中等信度和较强的效度,支持其作为一个可信的工具使用。为了更可靠的组合评估,需要更多的教员培训。
{"title":"Inter-rater reliability and content validity of the measurement tool for portfolio assessments used in the Introduction to Clinical Medicine course at Ewha Womans University College of Medicine: a methodological study.","authors":"Dong-Mi Yoo, Jae Jin Han","doi":"10.3352/jeehp.2024.21.39","DOIUrl":"10.3352/jeehp.2024.21.39","url":null,"abstract":"<p><strong>Purpose: </strong>This study aimed to examine the reliability and validity of a measurement tool for portfolio assessments in medical education. Specifically, it investigated scoring consistency among raters and assessment criteria appropriateness according to an expert panel.</p><p><strong>Methods: </strong>A cross-sectional observational study was conducted from September to December 2018 for the Introduction to Clinical Medicine course at the Ewha Womans University College of Medicine. Data were collected for 5 randomly selected portfolios scored by a gold-standard rater and 6 trained raters. An expert panel assessed the validity of 12 assessment items using the content validity index (CVI). Statistical analysis included Pearson correlation coefficients for rater alignment, the intraclass correlation coefficient (ICC) for inter-rater reliability, and the CVI for item-level validity.</p><p><strong>Results: </strong>Rater 1 had the highest Pearson correlation (0.8916) with the gold-standard rater, while Rater 5 had the lowest (0.4203). The ICC for all raters was 0.3821, improving to 0.4415 after excluding Raters 1 and 5, indicating a 15.6% reliability increase. All assessment items met the CVI threshold of ≥0.75, with some achieving a perfect score (CVI=1.0). However, items like \"sources\" and \"level and degree of performance\" showed lower validity (CVI=0.72).</p><p><strong>Conclusion: </strong>The present measurement tool for portfolio assessments demonstrated moderate reliability and strong validity, supporting its use as a credible tool. For a more reliable portfolio assessment, more faculty training is needed.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"21 ","pages":"39"},"PeriodicalIF":9.3,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11717432/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142802676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Revised evaluation objectives of the Korean Dentist Clinical Skill Test: a survey study and focus group interviews 韩国牙医临床技能测试的修订评估目标:一项调查研究和焦点小组访谈。
IF 9.3 Q1 EDUCATION, SCIENTIFIC DISCIPLINES Pub Date : 2024-01-01 Epub Date: 2024-05-30 DOI: 10.3352/jeehp.2024.21.11
Jae-Hoon Kim, Young J Kim, Deuk-Sang Ma, Se-Hee Park, Ahran Pae, June-Sung Shim, Il-Hyung Yang, Ui-Won Jung, Byung-Joon Choi, Yang-Hyun Chun

Purpose: This study aimed to propose a revision of the evaluation objectives of the Korean Dentist Clinical Skill Test by analyzing the opinions of those involved in the examination after a review of those objectives.

Methods: The clinical skill test objectives were reviewed based on the national-level dental practitioner competencies, dental school educational competencies, and the third dental practitioner job analysis. Current and former examinees were surveyed about their perceptions of the evaluation objectives. The validity of 22 evaluation objectives and overlapping perceptions based on area of specialty were surveyed on a 5-point Likert scale by professors who participated in the clinical skill test and dental school faculty members. Additionally, focus group interviews were conducted with experts on the examination.

Results: It was necessary to consider including competency assessments for “emergency rescue skills” and “planning and performing prosthetic treatment.” There were no significant differences between current and former examinees in their perceptions of the clinical skill test’s objectives. The professors who participated in the examination and dental school faculty members recognized that most of the objectives were valid. However, some responses stated that “oromaxillofacial cranial nerve examination,” “temporomandibular disorder palpation test,” and “space management for primary and mixed dentition” were unfeasible evaluation objectives and overlapped with dental specialty areas.

Conclusion: When revising the Korean Dentist Clinical Skill Test’s objectives, it is advisable to consider incorporating competency assessments related to “emergency rescue skills” and “planning and performing prosthetic treatment.”

目的:本研究旨在通过对韩国牙科医生临床技能考试的评价目标进行审查后,分析考试相关人员的意见,从而对这些目标提出修订建议:方法:根据国家级口腔执业医师能力、口腔医学院教育能力和第三次口腔执业医师工作分析,对临床技能考试目标进行了审核。调查了现任和前任考生对评价目标的看法。参加过临床技能测试的教授和牙科学院的教师对22个评价目标的有效性和基于专业领域的重叠认知进行了5点李克特量表调查。此外,还与考试专家进行了焦点小组访谈:结果:有必要考虑纳入 "紧急抢救技能 "和 "计划和实施修复治疗 "的能力评估。现任和前任考生对临床技能考试目标的看法没有明显差异。参加考试的教授和口腔医学院的教师都认为大部分目标是有效的。然而,一些答复指出,"口腔颌面部颅神经检查"、"颞下颌关节紊乱触诊测试 "和 "原牙和混合牙的牙间隙管理 "是不可行的评价目标,并且与牙科专业领域重叠:结论:在修订韩国牙医临床技能测试的目标时,最好考虑加入与 "紧急抢救技能 "和 "平移和进行修复治疗 "相关的能力评估。
{"title":"Revised evaluation objectives of the Korean Dentist Clinical Skill Test: a survey study and focus group interviews","authors":"Jae-Hoon Kim, Young J Kim, Deuk-Sang Ma, Se-Hee Park, Ahran Pae, June-Sung Shim, Il-Hyung Yang, Ui-Won Jung, Byung-Joon Choi, Yang-Hyun Chun","doi":"10.3352/jeehp.2024.21.11","DOIUrl":"10.3352/jeehp.2024.21.11","url":null,"abstract":"<p><strong>Purpose: </strong>This study aimed to propose a revision of the evaluation objectives of the Korean Dentist Clinical Skill Test by analyzing the opinions of those involved in the examination after a review of those objectives.</p><p><strong>Methods: </strong>The clinical skill test objectives were reviewed based on the national-level dental practitioner competencies, dental school educational competencies, and the third dental practitioner job analysis. Current and former examinees were surveyed about their perceptions of the evaluation objectives. The validity of 22 evaluation objectives and overlapping perceptions based on area of specialty were surveyed on a 5-point Likert scale by professors who participated in the clinical skill test and dental school faculty members. Additionally, focus group interviews were conducted with experts on the examination.</p><p><strong>Results: </strong>It was necessary to consider including competency assessments for “emergency rescue skills” and “planning and performing prosthetic treatment.” There were no significant differences between current and former examinees in their perceptions of the clinical skill test’s objectives. The professors who participated in the examination and dental school faculty members recognized that most of the objectives were valid. However, some responses stated that “oromaxillofacial cranial nerve examination,” “temporomandibular disorder palpation test,” and “space management for primary and mixed dentition” were unfeasible evaluation objectives and overlapped with dental specialty areas.</p><p><strong>Conclusion: </strong>When revising the Korean Dentist Clinical Skill Test’s objectives, it is advisable to consider incorporating competency assessments related to “emergency rescue skills” and “planning and performing prosthetic treatment.”</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"21 ","pages":"11"},"PeriodicalIF":9.3,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11219220/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141176415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reliability of a workplace-based assessment for the United States general surgical trainees’ intraoperative performance using multivariate generalizability theory: a psychometric study 利用多变量概括性理论对美国普通外科学员术中表现进行基于工作场所的评估的可靠性:心理计量学研究
IF 9.3 Q1 EDUCATION, SCIENTIFIC DISCIPLINES Pub Date : 2024-01-01 Epub Date: 2024-09-24 DOI: 10.3352/jeehp.2024.21.26
Ting Sun, Stella Yun Kim, Brigitte Kristin Smith, Yoon Soo Park

Purpose: The System for Improving and Measuring Procedure Learning (SIMPL), a smartphone-based operative assessment application, was developed to assess the intraoperative performance of surgical residents. This study aims to examine the reliability of the SIMPL assessment and determine the optimal number of procedures for a reliable assessment.

Methods: In this retrospective observational study, we analyzed data collected between 2015 and 2023 from 4,616 residents across 94 General Surgery Residency programs in the United States that utilized the SIMPL smartphone application. We employed multivariate generalizability theory and initially conducted generalizability studies to estimate the variance components associated with procedures. We then performed decision studies to estimate the reliability coefficient and the minimum number of procedures required for a reproducible assessment.

Results: We estimated that the reliability of the assessment of surgical trainees’ intraoperative autonomy and performance using SIMPL exceeded 0.70. Additionally, the optimal number of procedures required for a reproducible assessment was 10, 17, 15, and 17 for postgraduate year (PGY) 2, PGY 3, PGY 4, and PGY 5, respectively. Notably, the study highlighted that the assessment of residents in their senior years necessitated a larger number of procedures compared to those in their junior years.

Conclusion: The study demonstrated that the SIMPL assessment is reliably effective for evaluating the intraoperative performance of surgical trainees. Adjusting the number of procedures based on the trainees’ training stage enhances the assessment process’s accuracy and effectiveness.

目的:改进和测量手术学习系统(SIMPL)是一种基于智能手机的手术评估应用程序,用于评估外科住院医师的术中表现。本研究旨在检查 SIMPL 评估的可靠性,并确定可靠评估的最佳手术数量:在这项回顾性观察研究中,我们分析了 2015 年至 2023 年期间收集的数据,这些数据来自美国 94 个使用 SIMPL 智能手机应用程序的普通外科住院医师项目的 4616 名住院医师。我们采用了多变量概括性理论,首先进行了概括性研究,以估计与手术相关的方差成分。然后,我们进行了决策研究,以估算可靠性系数和可重复评估所需的最少程序数:我们估计,使用 SIMPL 评估外科学员术中自主性和表现的可靠性超过了 0.70。此外,对于研究生 2 年级、研究生 3 年级、研究生 4 年级和研究生 5 年级而言,可重复评估所需的最佳手术次数分别为 10、17、15 和 17 次。值得注意的是,该研究强调,与低年级住院医师相比,高年级住院医师的评估需要更多的程序:研究表明,SIMPL 评估能可靠有效地评估外科学员的术中表现。根据学员的培训阶段调整手术数量可提高评估过程的准确性和有效性。
{"title":"Reliability of a workplace-based assessment for the United States general surgical trainees’ intraoperative performance using multivariate generalizability theory: a psychometric study","authors":"Ting Sun, Stella Yun Kim, Brigitte Kristin Smith, Yoon Soo Park","doi":"10.3352/jeehp.2024.21.26","DOIUrl":"10.3352/jeehp.2024.21.26","url":null,"abstract":"<p><strong>Purpose: </strong>The System for Improving and Measuring Procedure Learning (SIMPL), a smartphone-based operative assessment application, was developed to assess the intraoperative performance of surgical residents. This study aims to examine the reliability of the SIMPL assessment and determine the optimal number of procedures for a reliable assessment.</p><p><strong>Methods: </strong>In this retrospective observational study, we analyzed data collected between 2015 and 2023 from 4,616 residents across 94 General Surgery Residency programs in the United States that utilized the SIMPL smartphone application. We employed multivariate generalizability theory and initially conducted generalizability studies to estimate the variance components associated with procedures. We then performed decision studies to estimate the reliability coefficient and the minimum number of procedures required for a reproducible assessment.</p><p><strong>Results: </strong>We estimated that the reliability of the assessment of surgical trainees’ intraoperative autonomy and performance using SIMPL exceeded 0.70. Additionally, the optimal number of procedures required for a reproducible assessment was 10, 17, 15, and 17 for postgraduate year (PGY) 2, PGY 3, PGY 4, and PGY 5, respectively. Notably, the study highlighted that the assessment of residents in their senior years necessitated a larger number of procedures compared to those in their junior years.</p><p><strong>Conclusion: </strong>The study demonstrated that the SIMPL assessment is reliably effective for evaluating the intraoperative performance of surgical trainees. Adjusting the number of procedures based on the trainees’ training stage enhances the assessment process’s accuracy and effectiveness.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"21 ","pages":"26"},"PeriodicalIF":9.3,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142356104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GPT-4o’s competency in answering the simulated written European Board of Interventional Radiology exam compared to a medical student and experts in Germany and its ability to generate exam items on interventional radiology: a descriptive study. 与德国医科学生和专家相比,GPT-4o 在模拟欧洲介入放射学委员会笔试中的答题能力及其生成介入放射学考试项目的能力:一项描述性研究。
IF 9.3 Q1 EDUCATION, SCIENTIFIC DISCIPLINES Pub Date : 2024-01-01 Epub Date: 2024-08-20 DOI: 10.3352/jeehp.2024.21.21
Sebastian Ebel, Constantin Ehrengut, Timm Denecke, Holger Gößmann, Anne Bettina Beeskow

Purpose: This study aimed to determine whether ChatGPT-4o, a generative artificial intelligence (AI) platform, was able to pass a simulated written European Board of Interventional Radiology (EBIR) exam and whether GPT-4o can be used to train medical students and interventional radiologists of different levels of expertise by generating exam items on interventional radiology.

Methods: GPT-4o was asked to answer 370 simulated exam items of the Cardiovascular and Interventional Radiology Society of Europe (CIRSE) for EBIR preparation (CIRSE Prep). Subsequently, GPT-4o was requested to generate exam items on interventional radiology topics at levels of difficulty suitable for medical students and the EBIR exam. Those generated items were answered by 4 participants, including a medical student, a resident, a consultant, and an EBIR holder. The correctly answered items were counted. One investigator checked the answers and items generated by GPT-4o for correctness and relevance. This work was done from April to July 2024.

Results: GPT-4o correctly answered 248 of the 370 CIRSE Prep items (67.0%). For 50 CIRSE Prep items, the medical student answered 46.0%, the resident 42.0%, the consultant 50.0%, and the EBIR holder 74.0% correctly. All participants answered 82.0% to 92.0% of the 50 GPT-4o generated items at the student level correctly. For the 50 GPT-4o items at the EBIR level, the medical student answered 32.0%, the resident 44.0%, the consultant 48.0%, and the EBIR holder 66.0% correctly. All participants could pass the GPT-4o-generated items for the student level; while the EBIR holder could pass the GPT-4o-generated items for the EBIR level. Two items (0.3%) out of 150 generated by the GPT-4o were assessed as implausible.

Conclusion: GPT-4o could pass the simulated written EBIR exam and create exam items of varying difficulty to train medical students and interventional radiologists.

目的:本研究旨在确定生成式人工智能(AI)平台 ChatGPT-4o 能否通过欧洲介入放射学委员会(EBIR)的模拟笔试,以及 GPT-4o 能否通过生成介入放射学的考试项目用于培训医学生和不同专业水平的介入放射医师:方法:要求 GPT-4o 回答欧洲心血管和介入放射学会(CIRSE)为 EBIR 准备的 370 个模拟考试项目(CIRSE 准备)。随后,GPT-4o 被要求生成适合医学生和 EBIR 考试难度的介入放射学题目。这些生成的题目由 4 名参与者回答,其中包括一名医学生、一名住院医师、一名顾问和一名 EBIR 持有者。对正确回答的题目进行统计。一名研究人员检查了 GPT-4o 生成的答案和项目的正确性和相关性。这项工作于 2024 年 4 月至 7 月完成:GPT-4o 正确回答了 370 个 CIRSE 预备项目中的 248 个(67.0%)。在 50 个 CIRSE 预备项目中,医学生回答正确率为 46.0%,住院医师为 42.0%,顾问为 50.0%,EBIR 持有者为 74.0%。在学生水平的 50 个 GPT-4o 生成项目中,所有参与者的正确率为 82.0% 至 92.0%。在 EBIR 级别的 50 个 GPT-4o 项目中,医学生的正确率为 32.0%,住院医师为 44.0%,顾问为 48.0%,EBIR 持有者为 66.0%。所有参与者都能通过学生水平的 GPT-4o 生成项目;而 EBIR 持有者能通过 EBIR 水平的 GPT-4o 生成项目。在 GPT-4o 生成的 150 个项目中,有 2 个项目(0.3%)被评为不可信:结论:GPT-4o 可以通过模拟 EBIR 笔试,并生成不同难度的考试项目,以培训医学生和介入放射医师。
{"title":"GPT-4o’s competency in answering the simulated written European Board of Interventional Radiology exam compared to a medical student and experts in Germany and its ability to generate exam items on interventional radiology: a descriptive study.","authors":"Sebastian Ebel, Constantin Ehrengut, Timm Denecke, Holger Gößmann, Anne Bettina Beeskow","doi":"10.3352/jeehp.2024.21.21","DOIUrl":"10.3352/jeehp.2024.21.21","url":null,"abstract":"<p><strong>Purpose: </strong>This study aimed to determine whether ChatGPT-4o, a generative artificial intelligence (AI) platform, was able to pass a simulated written European Board of Interventional Radiology (EBIR) exam and whether GPT-4o can be used to train medical students and interventional radiologists of different levels of expertise by generating exam items on interventional radiology.</p><p><strong>Methods: </strong>GPT-4o was asked to answer 370 simulated exam items of the Cardiovascular and Interventional Radiology Society of Europe (CIRSE) for EBIR preparation (CIRSE Prep). Subsequently, GPT-4o was requested to generate exam items on interventional radiology topics at levels of difficulty suitable for medical students and the EBIR exam. Those generated items were answered by 4 participants, including a medical student, a resident, a consultant, and an EBIR holder. The correctly answered items were counted. One investigator checked the answers and items generated by GPT-4o for correctness and relevance. This work was done from April to July 2024.</p><p><strong>Results: </strong>GPT-4o correctly answered 248 of the 370 CIRSE Prep items (67.0%). For 50 CIRSE Prep items, the medical student answered 46.0%, the resident 42.0%, the consultant 50.0%, and the EBIR holder 74.0% correctly. All participants answered 82.0% to 92.0% of the 50 GPT-4o generated items at the student level correctly. For the 50 GPT-4o items at the EBIR level, the medical student answered 32.0%, the resident 44.0%, the consultant 48.0%, and the EBIR holder 66.0% correctly. All participants could pass the GPT-4o-generated items for the student level; while the EBIR holder could pass the GPT-4o-generated items for the EBIR level. Two items (0.3%) out of 150 generated by the GPT-4o were assessed as implausible.</p><p><strong>Conclusion: </strong>GPT-4o could pass the simulated written EBIR exam and create exam items of varying difficulty to train medical students and interventional radiologists.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"21 ","pages":"21"},"PeriodicalIF":9.3,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142005513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Medical students’ patterns of using ChatGPT as a feedback tool and perceptions of ChatGPT in a Leadership and Communication course in Korea: a cross-sectional study 韩国医学生在领导与沟通课程中使用ChatGPT作为反馈工具的模式和对ChatGPT的认知:一项横断面研究
Q1 EDUCATION, SCIENTIFIC DISCIPLINES Pub Date : 2023-11-10 DOI: 10.3352/jeehp.2023.20.29
Janghee Park
Purpose: This study aimed to analyze patterns of using ChatGPT before and after group activities and to explore medical students’ perceptions of ChatGPT as a feedback tool in the classroom.Methods: The study included 99 2nd-year pre-medical students who participated in a “Leadership and Communication” course from March to June 2023. Students engaged in both individual and group activities related to negotiation strategies. ChatGPT was used to provide feedback on their solutions. A survey was administered to assess students’ perceptions of ChatGPT’s feedback, its use in the classroom, and the strengths and challenges of ChatGPT from May 17 to 19, 2023.Results: The students responded by indicating that ChatGPT’s feedback was helpful, and revised and resubmitted their group answers in various ways after receiving feedback. The majority of respondents expressed agreement with the use of ChatGPT during class. The most common response concerning the appropriate context of using ChatGPT’s feedback was “after the first round of discussion, for revisions.” There was a significant difference in satisfaction with ChatGPT’s feedback, including correctness, usefulness, and ethics, depending on whether or not ChatGPT was used during class, but there was no significant difference according to gender or whether students had previous experience with ChatGPT. The strongest advantages were “providing answers to questions” and “summarizing information,” and the worst disadvantage was “producing information without supporting evidence.”Conclusion: The students were aware of the advantages and disadvantages of ChatGPT, and they had a positive attitude toward using ChatGPT in the classroom.
目的:本研究旨在分析医学生在小组活动前后使用ChatGPT的模式,并探讨医学生在课堂上对ChatGPT作为反馈工具的看法。方法:对2023年3月至6月参加“领导与沟通”课程的99名医预科二年级学生进行研究。学生进行与谈判策略相关的个人和小组活动。ChatGPT用于对他们的解决方案提供反馈。从2023年5月17日至19日,我们进行了一项调查,以评估学生对ChatGPT反馈的看法、它在课堂上的使用情况,以及ChatGPT的优势和挑战。结果:学生的反应表明ChatGPT的反馈是有帮助的,并在收到反馈后以各种方式修改和重新提交他们的小组答案。大多数受访者表示同意在课堂上使用ChatGPT。关于使用ChatGPT反馈的适当上下文,最常见的回答是“在第一轮讨论之后,用于修订”。对ChatGPT反馈的满意度有显著差异,包括正确性、有用性和道德,这取决于是否在课堂上使用ChatGPT,但根据性别或学生是否有过ChatGPT的经验,没有显著差异。最大的优势是“提供问题的答案”和“总结信息”,而最大的劣势是“在没有支持证据的情况下提供信息”。结论:学生了解ChatGPT的优缺点,对在课堂上使用ChatGPT持积极态度。
{"title":"Medical students’ patterns of using ChatGPT as a feedback tool and perceptions of ChatGPT in a Leadership and Communication course in Korea: a cross-sectional study","authors":"Janghee Park","doi":"10.3352/jeehp.2023.20.29","DOIUrl":"https://doi.org/10.3352/jeehp.2023.20.29","url":null,"abstract":"Purpose: This study aimed to analyze patterns of using ChatGPT before and after group activities and to explore medical students’ perceptions of ChatGPT as a feedback tool in the classroom.Methods: The study included 99 2nd-year pre-medical students who participated in a “Leadership and Communication” course from March to June 2023. Students engaged in both individual and group activities related to negotiation strategies. ChatGPT was used to provide feedback on their solutions. A survey was administered to assess students’ perceptions of ChatGPT’s feedback, its use in the classroom, and the strengths and challenges of ChatGPT from May 17 to 19, 2023.Results: The students responded by indicating that ChatGPT’s feedback was helpful, and revised and resubmitted their group answers in various ways after receiving feedback. The majority of respondents expressed agreement with the use of ChatGPT during class. The most common response concerning the appropriate context of using ChatGPT’s feedback was “after the first round of discussion, for revisions.” There was a significant difference in satisfaction with ChatGPT’s feedback, including correctness, usefulness, and ethics, depending on whether or not ChatGPT was used during class, but there was no significant difference according to gender or whether students had previous experience with ChatGPT. The strongest advantages were “providing answers to questions” and “summarizing information,” and the worst disadvantage was “producing information without supporting evidence.”Conclusion: The students were aware of the advantages and disadvantages of ChatGPT, and they had a positive attitude toward using ChatGPT in the classroom.","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"99 27","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135092034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Can an artificial intelligence chatbot be the author of a scholarly article? 一个人工智能聊天机器人能成为一篇学术文章的作者吗?
Q1 EDUCATION, SCIENTIFIC DISCIPLINES Pub Date : 2023-02-27 DOI: 10.3352/jeehp.2022.20.6
Ju Yoen Lee
At the end of 2022, the appearance of ChatGPT, an artificial intelligence (AI) chatbot with amazing writing ability, caused a great sensation in academia. The chatbot turned out to be very capable, but also capable of deception, and the news broke that several researchers had listed the chatbot (including its earlier version) as co-authors of their academic papers. In response, Nature and Science expressed their position that this chatbot cannot be listed as an author in the papers they publish. Since an AI chatbot is not a human being, in the current legal system, the text automatically generated by an AI chatbot cannot be a copyrighted work; thus, an AI chatbot cannot be an author of a copyrighted work. Current AI chatbots such as ChatGPT are much more advanced than search engines in that they produce original text, but they still remain at the level of a search engine in that they cannot take responsibility for their writing. For this reason, they also cannot be authors from the perspective of research ethics.
2022年底,具有惊人写作能力的人工智能(AI)聊天机器人ChatGPT的出现在学术界引起了极大的轰动。事实证明,这个聊天机器人非常有能力,但也有欺骗能力,有消息称,几名研究人员已经将这个聊天机器人(包括它的早期版本)列为他们学术论文的共同作者。对此,《自然》和《科学》表示,不能在发表的论文中把这个聊天机器人列为作者。由于AI聊天机器人不是人类,在目前的法律体系中,AI聊天机器人自动生成的文本不可能是受版权保护的作品;因此,人工智能聊天机器人不能是受版权保护的作品的作者。目前,ChatGPT等人工智能聊天机器人在原创文字方面比搜索引擎先进得多,但由于无法承担写作责任,仍然停留在搜索引擎的水平。因此,从研究伦理的角度来看,他们也不能成为作者。
{"title":"Can an artificial intelligence chatbot be the author of a scholarly article?","authors":"Ju Yoen Lee","doi":"10.3352/jeehp.2022.20.6","DOIUrl":"https://doi.org/10.3352/jeehp.2022.20.6","url":null,"abstract":"At the end of 2022, the appearance of ChatGPT, an artificial intelligence (AI) chatbot with amazing writing ability, caused a great sensation in academia. The chatbot turned out to be very capable, but also capable of deception, and the news broke that several researchers had listed the chatbot (including its earlier version) as co-authors of their academic papers. In response, Nature and Science expressed their position that this chatbot cannot be listed as an author in the papers they publish. Since an AI chatbot is not a human being, in the current legal system, the text automatically generated by an AI chatbot cannot be a copyrighted work; thus, an AI chatbot cannot be an author of a copyrighted work. Current AI chatbots such as ChatGPT are much more advanced than search engines in that they produce original text, but they still remain at the level of a search engine in that they cannot take responsibility for their writing. For this reason, they also cannot be authors from the perspective of research ethics.","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135892320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Are ChatGPT’s knowledge and interpretation ability comparable to those of medical students in Korea for taking a parasitology examination?: a descriptive study ChatGPT的知识和解释能力与韩国医学生参加寄生虫学考试的能力相当吗?:描述性研究
IF 4.4 Q1 EDUCATION, SCIENTIFIC DISCIPLINES Pub Date : 2023-01-11 DOI: 10.3352/jeehp.2023.20.01
Sun Huh
This study aimed to compare the knowledge and interpretation ability of ChatGPT, a language model of artificial general intelligence, with those of medical students in Korea by administering a parasitology examination to both ChatGPT and medical students. The examination consisted of 79 items and was administered to ChatGPT on January 1, 2023. The examination results were analyzed in terms of ChatGPT’s overall performance score, its correct answer rate by the items’ knowledge level, and the acceptability of its explanations of the items. ChatGPT’s performance was lower than that of the medical students, and ChatGPT’s correct answer rate was not related to the items’ knowledge level. However, there was a relationship between acceptable explanations and correct answers. In conclusion, ChatGPT’s knowledge and interpretation ability for this parasitology examination were not yet comparable to those of medical students in Korea.
本研究旨在通过对通用人工智能语言模型ChatGPT和韩国医学生进行寄生虫学检查,将其知识和解释能力与韩国医学生的知识和理解能力进行比较。该检查包括79个项目,于2023年1月1日在ChatGPT进行。根据ChatGPT的总体表现分数、项目知识水平的正确回答率以及项目解释的可接受性对考试结果进行了分析。查特GPT的成绩低于医学生,查特GPt的正确回答率与项目的知识水平无关。然而,可接受的解释和正确的答案之间存在关系。总之,ChatGPT在这次寄生虫学考试中的知识和解释能力还不能与韩国医学生相比。
{"title":"Are ChatGPT’s knowledge and interpretation ability comparable to those of medical students in Korea for taking a parasitology examination?: a descriptive study","authors":"Sun Huh","doi":"10.3352/jeehp.2023.20.01","DOIUrl":"https://doi.org/10.3352/jeehp.2023.20.01","url":null,"abstract":"This study aimed to compare the knowledge and interpretation ability of ChatGPT, a language model of artificial general intelligence, with those of medical students in Korea by administering a parasitology examination to both ChatGPT and medical students. The examination consisted of 79 items and was administered to ChatGPT on January 1, 2023. The examination results were analyzed in terms of ChatGPT’s overall performance score, its correct answer rate by the items’ knowledge level, and the acceptability of its explanations of the items. ChatGPT’s performance was lower than that of the medical students, and ChatGPT’s correct answer rate was not related to the items’ knowledge level. However, there was a relationship between acceptable explanations and correct answers. In conclusion, ChatGPT’s knowledge and interpretation ability for this parasitology examination were not yet comparable to those of medical students in Korea.","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"20 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2023-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45226107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 117
Enhancement of the technical and non-technical skills of nurse anesthesia students using the Anesthetic List Management Assessment Tool in Iran: a quasi-experimental study. 在伊朗使用麻醉清单管理评估工具提高麻醉护士学生的技术和非技术技能:一项准实验研究。
IF 4.4 Q1 EDUCATION, SCIENTIFIC DISCIPLINES Pub Date : 2023-01-01 DOI: 10.3352/jeehp.2023.20.19
Ali Khalafi, Maedeh Kordnejad, Vahid Saidkhani

Purpose: This study investigated the effect of evaluations based on the Anesthetic List Management Assessment Tool (ALMAT) form on improving the technical and non-technical skills of final-year nurse anesthesia students at Ahvaz Jundishapur University of Medical Sciences (AJUMS).

Methods: This was a semi-experimental study with a pre-test and post-test design. It included 45 final-year nurse anesthesia students of AJUMS and lasted for 3 months. The technical and non-technical skills of the intervention group were assessed at 4 university hospitals using formative-feedback evaluation based onthe ALMAT form, from induction of anesthesia until reaching mastery and independence. Finally, the students’ degree of improvement in technical and non-technical skills was compared between the intervention and control groups. Statistical tests (the independent t-test, paired t-test, and Mann-Whitney test) were used to analyze the data.

Results: The rate of improvement in post-test scores of technical skills was significantly higher in the intervention group than in the control group (P<0.0001). Similarly, the students in the intervention group received significantly higher post-test scores for non-technical skills than the students in the control group (P<0.0001).

Conclusion: The findings of this study showed that the use of ALMAT as a formative-feedback evaluation method to evaluate technical and non-technical skills had a significant effect on improving these skills and was effective in helping students learn and reach mastery and independence.

目的:探讨基于麻醉清单管理评估工具(ALMAT)表格的评估对提高阿瓦士准噶尔医科大学(AJUMS)麻醉科毕业班护士的技术和非技术技能的影响。方法:采用前测和后测设计的半实验研究。该研究包括45名AJUMS最后一届护士麻醉学生,持续3个月。采用基于ALMAT表格的形成反馈评价方法,对4所大学医院干预组的技术和非技术技能进行评估,从麻醉诱导到达到掌握和独立。最后,比较干预组和对照组学生在技术和非技术技能方面的改善程度。采用统计检验(独立t检验、配对t检验和Mann-Whitney检验)对数据进行分析。结果:干预组技术技能测验后得分的改善率显著高于对照组(p)。结论:本研究结果表明,使用ALMAT作为一种形成反馈的评价方法来评价技术和非技术技能,对提高技术和非技术技能有显著的效果,对帮助学生学习和达到掌握和独立是有效的。
{"title":"Enhancement of the technical and non-technical skills of nurse anesthesia students using the Anesthetic List Management Assessment Tool in Iran: a quasi-experimental study.","authors":"Ali Khalafi,&nbsp;Maedeh Kordnejad,&nbsp;Vahid Saidkhani","doi":"10.3352/jeehp.2023.20.19","DOIUrl":"https://doi.org/10.3352/jeehp.2023.20.19","url":null,"abstract":"<p><strong>Purpose: </strong>This study investigated the effect of evaluations based on the Anesthetic List Management Assessment Tool (ALMAT) form on improving the technical and non-technical skills of final-year nurse anesthesia students at Ahvaz Jundishapur University of Medical Sciences (AJUMS).</p><p><strong>Methods: </strong>This was a semi-experimental study with a pre-test and post-test design. It included 45 final-year nurse anesthesia students of AJUMS and lasted for 3 months. The technical and non-technical skills of the intervention group were assessed at 4 university hospitals using formative-feedback evaluation based on\u0000the ALMAT form, from induction of anesthesia until reaching mastery and independence. Finally, the students’ degree of improvement in technical and non-technical skills was compared between the intervention and control groups. Statistical tests (the independent t-test, paired t-test, and Mann-Whitney test) were used to analyze the data.</p><p><strong>Results: </strong>The rate of improvement in post-test scores of technical skills was significantly higher in the intervention group than in the control group (P<0.0001). Similarly, the students in the intervention group received significantly higher post-test scores for non-technical skills than the students in the control group (P<0.0001).</p><p><strong>Conclusion: </strong>The findings of this study showed that the use of ALMAT as a formative-feedback evaluation method to evaluate technical and non-technical skills had a significant effect on improving these skills and was effective in helping students learn and reach mastery and independence.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"20 ","pages":"19"},"PeriodicalIF":4.4,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10352009/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9833205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
What impacts students' satisfaction the most from Medicine Student Experience Questionnaire in Australia: a validity study. 澳大利亚医学生体验问卷对学生满意度影响最大的因素:一项效度研究。
IF 4.4 Q1 EDUCATION, SCIENTIFIC DISCIPLINES Pub Date : 2023-01-01 DOI: 10.3352/jeehp.2023.20.2
Pin-Hsiang Huang, Gary Velan, Greg Smith, Melanie Fentoullis, Sean Edward Kennedy, Karen Jane Gibson, Kerry Uebel, Boaz Shulruf

Purpose: This study evaluated the validity of student feedback derived from Medicine Student Experience Questionnaire (MedSEQ), as well as the predictors of students' satisfaction in the Medicine program.

Methods: Data from MedSEQ applying to the University of New South Wales Medicine program in 2017, 2019, and 2021 were analyzed. Confirmatory factor analysis (CFA) and Cronbach's α were used to assess the construct validity and reliability of MedSEQ respectively. Hierarchical multiple linear regressions were used to identify the factors that most impact students' overall satisfaction with the program.

Results: A total of 1,719 students (34.50%) responded to MedSEQ. CFA showed good fit indices (root mean square error of approximation=0.051; comparative fit index=0.939; chi-square/degrees of freedom=6.429). All factors yielded good (α>0.7) or very good (α>0.8) levels of reliability, except the "online resources" factor, which had acceptable reliability (α=0.687). A multiple linear regression model with only demographic characteristics explained 3.8% of the variance in students' overall satisfaction, whereas the model adding 8 domains from MedSEQ explained 40%, indicating that 36.2% of the variance was attributable to students' experience across the 8 domains. Three domains had the strongest impact on overall satisfaction: "being cared for," "satisfaction with teaching," and "satisfaction with assessment" (β=0.327, 0.148, 0.148, respectively; all with P<0.001).

Conclusion: MedSEQ has good construct validity and high reliability, reflecting students' satisfaction with the Medicine program. Key factors impacting students' satisfaction are the perception of being cared for, quality teaching irrespective of the mode of delivery and fair assessment tasks which enhance learning.

目的:本研究评估医学生体验问卷(MedSEQ)学生反馈的效度,以及医学生满意度的预测因子。方法:对2017年、2019年和2021年申请新南威尔士大学医学项目的MedSEQ数据进行分析。采用验证性因子分析(CFA)和Cronbach’s α分别评估MedSEQ的结构效度和信度。层次多元线性回归用于确定最影响学生对该计划的整体满意度的因素。结果:共有1719名学生(34.50%)对MedSEQ有反应。CFA显示了良好的拟合指数(近似均方根误差=0.051;比较拟合指数=0.939;卡方/自由度=6.429)。除“在线资源”因子具有可接受的信度(α=0.687)外,所有因子的信度均为良好(α>0.7)或非常好(α>0.8)。仅包含人口统计学特征的多元线性回归模型解释了3.8%的学生总体满意度方差,而加入MedSEQ的8个领域的模型解释了40%,表明36.2%的方差可归因于学生在8个领域的经历。三个领域对整体满意度的影响最大:“被关心”,“教学满意度”和“评估满意度”(β分别=0.327,0.148,0.148;结论:MedSEQ量表具有较好的结构效度和较高的信度,反映了学生对医学专业的满意度。影响学生满意度的主要因素包括:被关心的感觉、不论教学方式如何的优质教学,以及促进学习的公平评估任务。
{"title":"What impacts students' satisfaction the most from Medicine Student Experience Questionnaire in Australia: a validity study.","authors":"Pin-Hsiang Huang,&nbsp;Gary Velan,&nbsp;Greg Smith,&nbsp;Melanie Fentoullis,&nbsp;Sean Edward Kennedy,&nbsp;Karen Jane Gibson,&nbsp;Kerry Uebel,&nbsp;Boaz Shulruf","doi":"10.3352/jeehp.2023.20.2","DOIUrl":"https://doi.org/10.3352/jeehp.2023.20.2","url":null,"abstract":"<p><strong>Purpose: </strong>This study evaluated the validity of student feedback derived from Medicine Student Experience Questionnaire (MedSEQ), as well as the predictors of students' satisfaction in the Medicine program.</p><p><strong>Methods: </strong>Data from MedSEQ applying to the University of New South Wales Medicine program in 2017, 2019, and 2021 were analyzed. Confirmatory factor analysis (CFA) and Cronbach's α were used to assess the construct validity and reliability of MedSEQ respectively. Hierarchical multiple linear regressions were used to identify the factors that most impact students' overall satisfaction with the program.</p><p><strong>Results: </strong>A total of 1,719 students (34.50%) responded to MedSEQ. CFA showed good fit indices (root mean square error of approximation=0.051; comparative fit index=0.939; chi-square/degrees of freedom=6.429). All factors yielded good (α>0.7) or very good (α>0.8) levels of reliability, except the \"online resources\" factor, which had acceptable reliability (α=0.687). A multiple linear regression model with only demographic characteristics explained 3.8% of the variance in students' overall satisfaction, whereas the model adding 8 domains from MedSEQ explained 40%, indicating that 36.2% of the variance was attributable to students' experience across the 8 domains. Three domains had the strongest impact on overall satisfaction: \"being cared for,\" \"satisfaction with teaching,\" and \"satisfaction with assessment\" (β=0.327, 0.148, 0.148, respectively; all with P<0.001).</p><p><strong>Conclusion: </strong>MedSEQ has good construct validity and high reliability, reflecting students' satisfaction with the Medicine program. Key factors impacting students' satisfaction are the perception of being cared for, quality teaching irrespective of the mode of delivery and fair assessment tasks which enhance learning.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"20 ","pages":"2"},"PeriodicalIF":4.4,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9986309/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10866573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study ChatGPT作为医学教育中生物统计学问题解决工具的有效性和局限性:一项描述性研究。
IF 4.4 Q1 EDUCATION, SCIENTIFIC DISCIPLINES Pub Date : 2023-01-01 Epub Date: 2023-10-16 DOI: 10.3352/jeehp.2023.20.28
Aleksandra Ignjatović, Lazar Stevanović
Purpose This study aimed to assess the performance of ChatGPT (GPT-3.5 and GPT-4) as a study tool in solving biostatistical problems and to identify any potential drawbacks that might arise from using ChatGPT in medical education, particularly in solving practical biostatistical problems. Methods ChatGPT was tested to evaluate its ability to solve biostatistical problems from the Handbook of Medical Statistics by Peacock and Peacock in this descriptive study. Tables from the problems were transformed into textual questions. Ten biostatistical problems were randomly chosen and used as text-based input for conversation with ChatGPT (versions 3.5 and 4). Results GPT-3.5 solved 5 practical problems in the first attempt, related to categorical data, cross-sectional study, measuring reliability, probability properties, and the t-test. GPT-3.5 failed to provide correct answers regarding analysis of variance, the chi-square test, and sample size within 3 attempts. GPT-4 also solved a task related to the confidence interval in the first attempt and solved all questions within 3 attempts, with precise guidance and monitoring. Conclusion The assessment of both versions of ChatGPT performance in 10 biostatistical problems revealed that GPT-3.5 and 4’s performance was below average, with correct response rates of 5 and 6 out of 10 on the first attempt. GPT-4 succeeded in providing all correct answers within 3 attempts. These findings indicate that students must be aware that this tool, even when providing and calculating different statistical analyses, can be wrong, and they should be aware of ChatGPT’s limitations and be careful when incorporating this model into medical education.
目的:本研究旨在评估ChatGPT(GPT-3.5和GPT-4)作为解决生物统计学问题的研究工具的性能,并确定在医学教育中使用ChatGPT可能产生的任何潜在缺陷,特别是在解决实际生物统计学问题时。方法:在这项描述性研究中,对ChatGPT进行测试,以评估其解决Peacock和Peacock的《医学统计手册》中的生物统计学问题的能力。问题的表格被转换为文本问题。随机选择10个生物统计学问题,并将其用作与ChatGPT(3.5版和4版)对话的基于文本的输入。结果:GPT-3.5在第一次尝试中解决了5个实际问题,涉及分类数据、横断面研究、测量可靠性、概率属性和t检验。GPT-3.5未能在3次尝试中提供有关ANOVA、卡方检验和样本量的正确答案。GPT-4还在第一次尝试中解决了一项与置信区间有关的任务,并在3次尝试内解决了所有问题,并进行了精确的指导和监测。结论:对两个版本的ChatGPT在10个生物统计学问题中的表现的评估显示,GPT-3.5和4的表现低于平均水平,第一次尝试的正确回答率分别为5和6(满分10)。GPT-4在3次尝试内成功提供了所有正确答案。这些发现表明,学生们必须意识到,即使在提供和计算不同的统计分析时,这种工具也可能是错误的,他们应该意识到ChatGPT的局限性,并在将这种模式纳入医学教育时要小心。
{"title":"Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study","authors":"Aleksandra Ignjatović, Lazar Stevanović","doi":"10.3352/jeehp.2023.20.28","DOIUrl":"10.3352/jeehp.2023.20.28","url":null,"abstract":"Purpose This study aimed to assess the performance of ChatGPT (GPT-3.5 and GPT-4) as a study tool in solving biostatistical problems and to identify any potential drawbacks that might arise from using ChatGPT in medical education, particularly in solving practical biostatistical problems. Methods ChatGPT was tested to evaluate its ability to solve biostatistical problems from the Handbook of Medical Statistics by Peacock and Peacock in this descriptive study. Tables from the problems were transformed into textual questions. Ten biostatistical problems were randomly chosen and used as text-based input for conversation with ChatGPT (versions 3.5 and 4). Results GPT-3.5 solved 5 practical problems in the first attempt, related to categorical data, cross-sectional study, measuring reliability, probability properties, and the t-test. GPT-3.5 failed to provide correct answers regarding analysis of variance, the chi-square test, and sample size within 3 attempts. GPT-4 also solved a task related to the confidence interval in the first attempt and solved all questions within 3 attempts, with precise guidance and monitoring. Conclusion The assessment of both versions of ChatGPT performance in 10 biostatistical problems revealed that GPT-3.5 and 4’s performance was below average, with correct response rates of 5 and 6 out of 10 on the first attempt. GPT-4 succeeded in providing all correct answers within 3 attempts. These findings indicate that students must be aware that this tool, even when providing and calculating different statistical analyses, can be wrong, and they should be aware of ChatGPT’s limitations and be careful when incorporating this model into medical education.","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"20 ","pages":"28"},"PeriodicalIF":4.4,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10646144/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41239759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Educational Evaluation for Health Professions
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1