Stephen D. Schneid, Chris Armour, Sean Evans, Katharina Brandl
{"title":"Alexa, write my exam:创建 MCQ 的 ChatGPT","authors":"Stephen D. Schneid, Chris Armour, Sean Evans, Katharina Brandl","doi":"10.1111/medu.15496","DOIUrl":null,"url":null,"abstract":"<p>Writing high-quality exam questions requires substantial faculty development and, more importantly, diverts time from other significant educational responsibilities. Recent research has demonstrated the efficiency of ChatGPT in generating multiple-choice questions (MCQs) and its ability to pass all three United States Medical Licensing Exams.<span><sup>1</sup></span> Given the potential of new artificial intelligence systems like ChatGPT, this study aims to explore their use in streamlining item writing without compromising the desirable psychometric properties of assessments.</p><p>ChatGPT 3.5 was prompted to ‘write 25 MCQs with clinical vignette in UMSLE Step 1 style on the pharmacology of antibiotics, antivirals and antiparasitic drugs addressing their indications, mechanism of action, adverse effects and contraindications’. Faculty reviewed all questions for accuracy and made minor modifications. For questions that did not align with the courses' learning objectives, ChatGPT was prompted to generate alternatives, such as ‘another question on the Pharmacology of HIV drugs’. Additionally, 25 MCQs were created without the help of ChatGPT. ChatGPT assisted question writing took approximately 1 hour (with adjustments and corrections) compared to 9 hours without the help of ChatGPT.</p><p>Seventy-one second year Pharmacy students were assessed in Spring 2023 with a 50-item exam consisting of 25 ChatGPT-constructed and 25 faculty-generated MCQs. We compared the difficulty and psychometric characteristics of the ChatGPT-assisted and non-assisted questions using descriptive statistics, student's t-tests and Mann–Whitney test.</p><p>Students' performance on MCQs generated by ChatGPT was not significantly different to that on faculty-generated items for the average scores (76.44%, SD = 16.71 for ChatGPT vs. 82.52 %, SD = 10.90 for faculty), discrimination index (0.29, SD = 0.15 for ChatGPT vs. 0.25, SD = 0.17 for faculty), and the point-biserial correlation (0.31, SD = 0.13 for ChatGPT vs. 0.28, SD = 0.15 for faculty). Students took longer on average to answer ChatGPT-generated questions compared to faculty-generated questions (71 seconds, SD = 22 for ChatGPT vs. 58 seconds, SD = 25 for faculty, p < 0.05), likely due to the prevalence of ‘window dressing’. This question flaw was identified in 40% of the ChatGPT-generated questions, which may explain the additional time required.</p><p>We learned that while ChatGPT can effectively generate high-quality MCQs, saving time in the process, careful review by content experts is necessary to ensure the quality of the questions, particularly to identify and correct ‘window dressing’ flaws commonly found in ChatGPT-generated items.</p><p>We will present this data at upcoming faculty development sessions to promote the adoption of ChatGPT for generating exam questions. By presenting robust data demonstrating ChatGPT's efficacy, we believe that more faculty will integrate this tool into their question writing processes. Faculty will also be alerted to potential questions flaws and prepared to address them.</p><p>Additionally, recognising that students often desire more practice questions, we discovered that they are generally unfamiliar with this method. We plan to empower students to use ChatGPT to assist with their studies, while concurrently training faculty to become more adept at using ChatGPT to generate both practice and test items.</p><p>The authors declare that they have no conflict of interest.</p>","PeriodicalId":18370,"journal":{"name":"Medical Education","volume":"58 11","pages":"1373-1374"},"PeriodicalIF":4.9000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/medu.15496","citationCount":"0","resultStr":"{\"title\":\"Alexa, write my exam: ChatGPT for MCQ creation\",\"authors\":\"Stephen D. Schneid, Chris Armour, Sean Evans, Katharina Brandl\",\"doi\":\"10.1111/medu.15496\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Writing high-quality exam questions requires substantial faculty development and, more importantly, diverts time from other significant educational responsibilities. Recent research has demonstrated the efficiency of ChatGPT in generating multiple-choice questions (MCQs) and its ability to pass all three United States Medical Licensing Exams.<span><sup>1</sup></span> Given the potential of new artificial intelligence systems like ChatGPT, this study aims to explore their use in streamlining item writing without compromising the desirable psychometric properties of assessments.</p><p>ChatGPT 3.5 was prompted to ‘write 25 MCQs with clinical vignette in UMSLE Step 1 style on the pharmacology of antibiotics, antivirals and antiparasitic drugs addressing their indications, mechanism of action, adverse effects and contraindications’. Faculty reviewed all questions for accuracy and made minor modifications. For questions that did not align with the courses' learning objectives, ChatGPT was prompted to generate alternatives, such as ‘another question on the Pharmacology of HIV drugs’. Additionally, 25 MCQs were created without the help of ChatGPT. ChatGPT assisted question writing took approximately 1 hour (with adjustments and corrections) compared to 9 hours without the help of ChatGPT.</p><p>Seventy-one second year Pharmacy students were assessed in Spring 2023 with a 50-item exam consisting of 25 ChatGPT-constructed and 25 faculty-generated MCQs. We compared the difficulty and psychometric characteristics of the ChatGPT-assisted and non-assisted questions using descriptive statistics, student's t-tests and Mann–Whitney test.</p><p>Students' performance on MCQs generated by ChatGPT was not significantly different to that on faculty-generated items for the average scores (76.44%, SD = 16.71 for ChatGPT vs. 82.52 %, SD = 10.90 for faculty), discrimination index (0.29, SD = 0.15 for ChatGPT vs. 0.25, SD = 0.17 for faculty), and the point-biserial correlation (0.31, SD = 0.13 for ChatGPT vs. 0.28, SD = 0.15 for faculty). Students took longer on average to answer ChatGPT-generated questions compared to faculty-generated questions (71 seconds, SD = 22 for ChatGPT vs. 58 seconds, SD = 25 for faculty, p < 0.05), likely due to the prevalence of ‘window dressing’. This question flaw was identified in 40% of the ChatGPT-generated questions, which may explain the additional time required.</p><p>We learned that while ChatGPT can effectively generate high-quality MCQs, saving time in the process, careful review by content experts is necessary to ensure the quality of the questions, particularly to identify and correct ‘window dressing’ flaws commonly found in ChatGPT-generated items.</p><p>We will present this data at upcoming faculty development sessions to promote the adoption of ChatGPT for generating exam questions. By presenting robust data demonstrating ChatGPT's efficacy, we believe that more faculty will integrate this tool into their question writing processes. Faculty will also be alerted to potential questions flaws and prepared to address them.</p><p>Additionally, recognising that students often desire more practice questions, we discovered that they are generally unfamiliar with this method. We plan to empower students to use ChatGPT to assist with their studies, while concurrently training faculty to become more adept at using ChatGPT to generate both practice and test items.</p><p>The authors declare that they have no conflict of interest.</p>\",\"PeriodicalId\":18370,\"journal\":{\"name\":\"Medical Education\",\"volume\":\"58 11\",\"pages\":\"1373-1374\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2024-08-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/medu.15496\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medical Education\",\"FirstCategoryId\":\"95\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/medu.15496\",\"RegionNum\":1,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EDUCATION, SCIENTIFIC DISCIPLINES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical Education","FirstCategoryId":"95","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/medu.15496","RegionNum":1,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
Writing high-quality exam questions requires substantial faculty development and, more importantly, diverts time from other significant educational responsibilities. Recent research has demonstrated the efficiency of ChatGPT in generating multiple-choice questions (MCQs) and its ability to pass all three United States Medical Licensing Exams.1 Given the potential of new artificial intelligence systems like ChatGPT, this study aims to explore their use in streamlining item writing without compromising the desirable psychometric properties of assessments.
ChatGPT 3.5 was prompted to ‘write 25 MCQs with clinical vignette in UMSLE Step 1 style on the pharmacology of antibiotics, antivirals and antiparasitic drugs addressing their indications, mechanism of action, adverse effects and contraindications’. Faculty reviewed all questions for accuracy and made minor modifications. For questions that did not align with the courses' learning objectives, ChatGPT was prompted to generate alternatives, such as ‘another question on the Pharmacology of HIV drugs’. Additionally, 25 MCQs were created without the help of ChatGPT. ChatGPT assisted question writing took approximately 1 hour (with adjustments and corrections) compared to 9 hours without the help of ChatGPT.
Seventy-one second year Pharmacy students were assessed in Spring 2023 with a 50-item exam consisting of 25 ChatGPT-constructed and 25 faculty-generated MCQs. We compared the difficulty and psychometric characteristics of the ChatGPT-assisted and non-assisted questions using descriptive statistics, student's t-tests and Mann–Whitney test.
Students' performance on MCQs generated by ChatGPT was not significantly different to that on faculty-generated items for the average scores (76.44%, SD = 16.71 for ChatGPT vs. 82.52 %, SD = 10.90 for faculty), discrimination index (0.29, SD = 0.15 for ChatGPT vs. 0.25, SD = 0.17 for faculty), and the point-biserial correlation (0.31, SD = 0.13 for ChatGPT vs. 0.28, SD = 0.15 for faculty). Students took longer on average to answer ChatGPT-generated questions compared to faculty-generated questions (71 seconds, SD = 22 for ChatGPT vs. 58 seconds, SD = 25 for faculty, p < 0.05), likely due to the prevalence of ‘window dressing’. This question flaw was identified in 40% of the ChatGPT-generated questions, which may explain the additional time required.
We learned that while ChatGPT can effectively generate high-quality MCQs, saving time in the process, careful review by content experts is necessary to ensure the quality of the questions, particularly to identify and correct ‘window dressing’ flaws commonly found in ChatGPT-generated items.
We will present this data at upcoming faculty development sessions to promote the adoption of ChatGPT for generating exam questions. By presenting robust data demonstrating ChatGPT's efficacy, we believe that more faculty will integrate this tool into their question writing processes. Faculty will also be alerted to potential questions flaws and prepared to address them.
Additionally, recognising that students often desire more practice questions, we discovered that they are generally unfamiliar with this method. We plan to empower students to use ChatGPT to assist with their studies, while concurrently training faculty to become more adept at using ChatGPT to generate both practice and test items.
The authors declare that they have no conflict of interest.
期刊介绍:
Medical Education seeks to be the pre-eminent journal in the field of education for health care professionals, and publishes material of the highest quality, reflecting world wide or provocative issues and perspectives.
The journal welcomes high quality papers on all aspects of health professional education including;
-undergraduate education
-postgraduate training
-continuing professional development
-interprofessional education