Race with the machines: Assessing the capability of generative AI in solving authentic assessments

IF 3.3 3区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Australasian Journal of Educational Technology Pub Date : 2023-12-22 DOI:10.14742/ajet.8902

Binh Nguyen Thanh, Diem Thi-Ngoc Vo, Minh Nguyen Nhat, Thi Thu Tra Pham, Hieu Thai Trung, Son Ha Xuan

{"title":"Race with the machines: Assessing the capability of generative AI in solving authentic assessments","authors":"Binh Nguyen Thanh, Diem Thi-Ngoc Vo, Minh Nguyen Nhat, Thi Thu Tra Pham, Hieu Thai Trung, Son Ha Xuan","doi":"10.14742/ajet.8902","DOIUrl":null,"url":null,"abstract":"In this study, we introduce a framework designed to help educators assess the effectiveness of popular generative artificial intelligence (AI) tools in solving authentic assessments. We employed Bloom’s taxonomy as a guiding principle to create authentic assessments that evaluate the capabilities of generative AI tools. We applied this framework to assess the abilities of ChatGPT-4, ChatGPT-3.5, Google Bard and Microsoft Bing in solving authentic assessments in economics. We found that generative AI tools perform very well at the lower levels of Bloom's taxonomy while still maintaining a decent level of performance at the higher levels, with “create” being the weakest level of performance. Interestingly, these tools are better able to address numeric-based questions than text-based ones. Moreover, all the generative AI tools exhibit weaknesses in building arguments based on theoretical frameworks, maintaining the coherence of different arguments and providing appropriate references. Our study provides educators with a framework to assess the capabilities of generative AI tools, enabling them to make more informed decisions regarding assessments and learning activities. Our findings demand a strategic reimagining of educational goals and assessments, emphasising higher cognitive skills and calling for a concerted effort to enhance the capabilities of educators in preparing students for a rapidly transforming professional environment.\nImplications for practice or policy\n\nOur proposed framework enables educators to systematically evaluate the capabilities of widely used generative AI tools in assessments and assist them in the assessment design process.\nTertiary institutions should re-evaluate and redesign programmes and course learning outcomes. The new focus on learning outcomes should address the higher levels of educational goals of Bloom’s taxonomy, specifically the “create” level.\n","PeriodicalId":47812,"journal":{"name":"Australasian Journal of Educational Technology","volume":"4 2","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2023-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Australasian Journal of Educational Technology","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.14742/ajet.8902","RegionNum":3,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

Abstract

In this study, we introduce a framework designed to help educators assess the effectiveness of popular generative artificial intelligence (AI) tools in solving authentic assessments. We employed Bloom’s taxonomy as a guiding principle to create authentic assessments that evaluate the capabilities of generative AI tools. We applied this framework to assess the abilities of ChatGPT-4, ChatGPT-3.5, Google Bard and Microsoft Bing in solving authentic assessments in economics. We found that generative AI tools perform very well at the lower levels of Bloom's taxonomy while still maintaining a decent level of performance at the higher levels, with “create” being the weakest level of performance. Interestingly, these tools are better able to address numeric-based questions than text-based ones. Moreover, all the generative AI tools exhibit weaknesses in building arguments based on theoretical frameworks, maintaining the coherence of different arguments and providing appropriate references. Our study provides educators with a framework to assess the capabilities of generative AI tools, enabling them to make more informed decisions regarding assessments and learning activities. Our findings demand a strategic reimagining of educational goals and assessments, emphasising higher cognitive skills and calling for a concerted effort to enhance the capabilities of educators in preparing students for a rapidly transforming professional environment. Implications for practice or policy Our proposed framework enables educators to systematically evaluate the capabilities of widely used generative AI tools in assessments and assist them in the assessment design process. Tertiary institutions should re-evaluate and redesign programmes and course learning outcomes. The new focus on learning outcomes should address the higher levels of educational goals of Bloom’s taxonomy, specifically the “create” level.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

与机器赛跑评估生成式人工智能解决真实评估问题的能力

在本研究中，我们介绍了一个框架，旨在帮助教育工作者评估流行的生成式人工智能（AI）工具在解决真实评估中的有效性。我们采用布卢姆分类法作为指导原则，创建真实的测评，以评估生成式人工智能工具的能力。我们运用这一框架评估了 ChatGPT-4、ChatGPT-3.5、Google Bard 和 Microsoft Bing 在解决经济学真实评估方面的能力。我们发现，生成式人工智能工具在布卢姆分类法的较低层次上表现非常出色，而在较高层次上仍能保持不错的水平，其中 "创建 "是表现最弱的层次。有趣的是，与基于文本的问题相比，这些工具能更好地处理基于数字的问题。此外，所有生成式人工智能工具在根据理论框架建立论点、保持不同论点的连贯性和提供适当的参考资料方面都表现出弱点。我们的研究为教育工作者提供了一个评估生成式人工智能工具能力的框架，使他们能够就评估和学习活动做出更明智的决定。我们的研究结果要求对教育目标和评估进行战略性的重新构想，强调更高的认知技能，并呼吁共同努力，提高教育工作者的能力，使学生为迅速转变的职业环境做好准备。对实践或政策的影响我们提出的框架使教育工作者能够系统地评估在评估中广泛使用的生成式人工智能工具的能力，并在评估设计过程中为他们提供帮助。对学习成果的新关注点应涉及布卢姆分类法中更高层次的教育目标，特别是 "创造 "层次。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊