School of Computing and Communications, The Open University, Milton Keynes, MK7 6AA, UK

IF 3.2 3区工程技术 Q1 EDUCATION, SCIENTIFIC DISCIPLINES ACM Transactions on Computing Education Pub Date : 2023-11-21 DOI:10.1145/3633287

Kevin Waugh, Mark Slaymaker, Marian Petre, John Woodthorpe, Daniel Gooch

{"title":"School of Computing and Communications, The Open University, Milton Keynes, MK7 6AA, UK","authors":"Kevin Waugh, Mark Slaymaker, Marian Petre, John Woodthorpe, Daniel Gooch","doi":"10.1145/3633287","DOIUrl":null,"url":null,"abstract":"<p>Cheating has been a long standing issue in university assessments. However, the release of ChatGPT and other free-to-use generative AI tools have provided a new and distinct method for cheating. Students can run many assessment questions through the tool and generate a superficially compelling answer, which may or may not be accurate. We ran a dual-anonymous “quality assurance” marking exercise across four end-of-module assessments across a distance university CS curriculum. Each marker received five ChatGPT-generated scripts alongside 10 student scripts. A total of 90 scripts were marked; every ChatGPT-generated script for the undergraduate modules received at least a passing grade (>40%), with all of the introductory module CS1 scripts receiving a distinction (>85%). None of the ChatGPT taught postgraduate scripts received a passing grade (>50%). We also present the results of interviewing the markers, and of running our sample scripts through a GPT-2 detector and the TurnItIn AI detector which both identified every ChatGPT-generated script, but differed in the number of false-positives. As such, we contribute a baseline understanding of how the public release of generative AI is likely to significantly impact quality assurance processes. Our analysis demonstrates that, in most cases, across a range of question formats, topics and study levels, ChatGPT is at least capable of producing adequate answers for undergraduate assessment.</p>","PeriodicalId":48764,"journal":{"name":"ACM Transactions on Computing Education","volume":"35 6","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2023-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Computing Education","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1145/3633287","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}

引用次数: 0

Abstract

Cheating has been a long standing issue in university assessments. However, the release of ChatGPT and other free-to-use generative AI tools have provided a new and distinct method for cheating. Students can run many assessment questions through the tool and generate a superficially compelling answer, which may or may not be accurate. We ran a dual-anonymous “quality assurance” marking exercise across four end-of-module assessments across a distance university CS curriculum. Each marker received five ChatGPT-generated scripts alongside 10 student scripts. A total of 90 scripts were marked; every ChatGPT-generated script for the undergraduate modules received at least a passing grade (>40%), with all of the introductory module CS1 scripts receiving a distinction (>85%). None of the ChatGPT taught postgraduate scripts received a passing grade (>50%). We also present the results of interviewing the markers, and of running our sample scripts through a GPT-2 detector and the TurnItIn AI detector which both identified every ChatGPT-generated script, but differed in the number of false-positives. As such, we contribute a baseline understanding of how the public release of generative AI is likely to significantly impact quality assurance processes. Our analysis demonstrates that, in most cases, across a range of question formats, topics and study levels, ChatGPT is at least capable of producing adequate answers for undergraduate assessment.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

英国开放大学计算与通信学院，米尔顿凯恩斯，MK7 6AA

在大学评估中，作弊一直是一个长期存在的问题。然而，ChatGPT和其他免费使用的生成式人工智能工具的发布为作弊提供了一种新的、独特的方法。学生可以通过该工具运行许多评估问题，并生成一个表面上引人注目的答案，这个答案可能准确，也可能不准确。我们在远程大学计算机科学课程的四个模块结束评估中进行了双匿名“质量保证”评分练习。每个阅卷者收到5个chatgpt生成的脚本和10个学生脚本。共有90个脚本被标记;每个chatgpt为本科模块生成的脚本都至少获得了及格分数(>40%)，所有入门模块CS1脚本都获得了优异分数(>85%)。ChatGPT教授的研究生脚本都没有通过(50%)。我们还介绍了采访标记的结果，以及通过GPT-2检测器和TurnItIn AI检测器运行示例脚本的结果，这两个检测器都识别了每个chatgpt生成的脚本，但假阳性的数量不同。因此，我们对生成式人工智能的公开发布如何可能对质量保证过程产生重大影响做出了基本的理解。我们的分析表明，在大多数情况下，在一系列问题格式、主题和学习水平上，ChatGPT至少能够为本科评估提供足够的答案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Computing Education EDUCATION, SCIENTIFIC DISCIPLINES-

CiteScore

6.50

自引率

16.70%

发文量

期刊介绍： ACM Transactions on Computing Education (TOCE) (formerly named JERIC, Journal on Educational Resources in Computing) covers diverse aspects of computing education: traditional computer science, computer engineering, information technology, and informatics; emerging aspects of computing; and applications of computing to other disciplines. The common characteristics shared by these papers are a scholarly approach to teaching and learning, a broad appeal to educational practitioners, and a clear connection to student learning.