Kevin Waugh, Mark Slaymaker, Marian Petre, John Woodthorpe, Daniel Gooch
{"title":"英国开放大学计算与通信学院,米尔顿凯恩斯,MK7 6AA","authors":"Kevin Waugh, Mark Slaymaker, Marian Petre, John Woodthorpe, Daniel Gooch","doi":"10.1145/3633287","DOIUrl":null,"url":null,"abstract":"<p>Cheating has been a long standing issue in university assessments. However, the release of ChatGPT and other free-to-use generative AI tools have provided a new and distinct method for cheating. Students can run many assessment questions through the tool and generate a superficially compelling answer, which may or may not be accurate. We ran a dual-anonymous “quality assurance” marking exercise across four end-of-module assessments across a distance university CS curriculum. Each marker received five ChatGPT-generated scripts alongside 10 student scripts. A total of 90 scripts were marked; every ChatGPT-generated script for the undergraduate modules received at least a passing grade (>40%), with all of the introductory module CS1 scripts receiving a distinction (>85%). None of the ChatGPT taught postgraduate scripts received a passing grade (>50%). We also present the results of interviewing the markers, and of running our sample scripts through a GPT-2 detector and the TurnItIn AI detector which both identified every ChatGPT-generated script, but differed in the number of false-positives. As such, we contribute a baseline understanding of how the public release of generative AI is likely to significantly impact quality assurance processes. Our analysis demonstrates that, in most cases, across a range of question formats, topics and study levels, ChatGPT is at least capable of producing adequate answers for undergraduate assessment.</p>","PeriodicalId":48764,"journal":{"name":"ACM Transactions on Computing Education","volume":"35 6","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2023-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"School of Computing and Communications, The Open University, Milton Keynes, MK7 6AA, UK\",\"authors\":\"Kevin Waugh, Mark Slaymaker, Marian Petre, John Woodthorpe, Daniel Gooch\",\"doi\":\"10.1145/3633287\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Cheating has been a long standing issue in university assessments. However, the release of ChatGPT and other free-to-use generative AI tools have provided a new and distinct method for cheating. Students can run many assessment questions through the tool and generate a superficially compelling answer, which may or may not be accurate. We ran a dual-anonymous “quality assurance” marking exercise across four end-of-module assessments across a distance university CS curriculum. Each marker received five ChatGPT-generated scripts alongside 10 student scripts. A total of 90 scripts were marked; every ChatGPT-generated script for the undergraduate modules received at least a passing grade (>40%), with all of the introductory module CS1 scripts receiving a distinction (>85%). None of the ChatGPT taught postgraduate scripts received a passing grade (>50%). We also present the results of interviewing the markers, and of running our sample scripts through a GPT-2 detector and the TurnItIn AI detector which both identified every ChatGPT-generated script, but differed in the number of false-positives. As such, we contribute a baseline understanding of how the public release of generative AI is likely to significantly impact quality assurance processes. Our analysis demonstrates that, in most cases, across a range of question formats, topics and study levels, ChatGPT is at least capable of producing adequate answers for undergraduate assessment.</p>\",\"PeriodicalId\":48764,\"journal\":{\"name\":\"ACM Transactions on Computing Education\",\"volume\":\"35 6\",\"pages\":\"\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2023-11-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Computing Education\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1145/3633287\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EDUCATION, SCIENTIFIC DISCIPLINES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Computing Education","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1145/3633287","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
School of Computing and Communications, The Open University, Milton Keynes, MK7 6AA, UK
Cheating has been a long standing issue in university assessments. However, the release of ChatGPT and other free-to-use generative AI tools have provided a new and distinct method for cheating. Students can run many assessment questions through the tool and generate a superficially compelling answer, which may or may not be accurate. We ran a dual-anonymous “quality assurance” marking exercise across four end-of-module assessments across a distance university CS curriculum. Each marker received five ChatGPT-generated scripts alongside 10 student scripts. A total of 90 scripts were marked; every ChatGPT-generated script for the undergraduate modules received at least a passing grade (>40%), with all of the introductory module CS1 scripts receiving a distinction (>85%). None of the ChatGPT taught postgraduate scripts received a passing grade (>50%). We also present the results of interviewing the markers, and of running our sample scripts through a GPT-2 detector and the TurnItIn AI detector which both identified every ChatGPT-generated script, but differed in the number of false-positives. As such, we contribute a baseline understanding of how the public release of generative AI is likely to significantly impact quality assurance processes. Our analysis demonstrates that, in most cases, across a range of question formats, topics and study levels, ChatGPT is at least capable of producing adequate answers for undergraduate assessment.
期刊介绍:
ACM Transactions on Computing Education (TOCE) (formerly named JERIC, Journal on Educational Resources in Computing) covers diverse aspects of computing education: traditional computer science, computer engineering, information technology, and informatics; emerging aspects of computing; and applications of computing to other disciplines. The common characteristics shared by these papers are a scholarly approach to teaching and learning, a broad appeal to educational practitioners, and a clear connection to student learning.