Race with the machines: Assessing the capability of generative AI in solving authentic assessments

IF 3.3 3区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Australasian Journal of Educational Technology Pub Date : 2023-12-22 DOI:10.14742/ajet.8902
Binh Nguyen Thanh, Diem Thi-Ngoc Vo, Minh Nguyen Nhat, Thi Thu Tra Pham, Hieu Thai Trung, Son Ha Xuan
{"title":"Race with the machines: Assessing the capability of generative AI in solving authentic assessments","authors":"Binh Nguyen Thanh, Diem Thi-Ngoc Vo, Minh Nguyen Nhat, Thi Thu Tra Pham, Hieu Thai Trung, Son Ha Xuan","doi":"10.14742/ajet.8902","DOIUrl":null,"url":null,"abstract":"In this study, we introduce a framework designed to help educators assess the effectiveness of popular generative artificial intelligence (AI) tools in solving authentic assessments. We employed Bloom’s taxonomy as a guiding principle to create authentic assessments that evaluate the capabilities of generative AI tools. We applied this framework to assess the abilities of ChatGPT-4, ChatGPT-3.5, Google Bard and Microsoft Bing in solving authentic assessments in economics. We found that generative AI tools perform very well at the lower levels of Bloom's taxonomy while still maintaining a decent level of performance at the higher levels, with “create” being the weakest level of performance. Interestingly, these tools are better able to address numeric-based questions than text-based ones. Moreover, all the generative AI tools exhibit weaknesses in building arguments based on theoretical frameworks, maintaining the coherence of different arguments and providing appropriate references. Our study provides educators with a framework to assess the capabilities of generative AI tools, enabling them to make more informed decisions regarding assessments and learning activities. Our findings demand a strategic reimagining of educational goals and assessments, emphasising higher cognitive skills and calling for a concerted effort to enhance the capabilities of educators in preparing students for a rapidly transforming professional environment.\nImplications for practice or policy\n\nOur proposed framework enables educators to systematically evaluate the capabilities of widely used generative AI tools in assessments and assist them in the assessment design process.\nTertiary institutions should re-evaluate and redesign programmes and course learning outcomes. The new focus on learning outcomes should address the higher levels of educational goals of Bloom’s taxonomy, specifically the “create” level.\n","PeriodicalId":47812,"journal":{"name":"Australasian Journal of Educational Technology","volume":null,"pages":null},"PeriodicalIF":3.3000,"publicationDate":"2023-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Australasian Journal of Educational Technology","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.14742/ajet.8902","RegionNum":3,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 0

Abstract

In this study, we introduce a framework designed to help educators assess the effectiveness of popular generative artificial intelligence (AI) tools in solving authentic assessments. We employed Bloom’s taxonomy as a guiding principle to create authentic assessments that evaluate the capabilities of generative AI tools. We applied this framework to assess the abilities of ChatGPT-4, ChatGPT-3.5, Google Bard and Microsoft Bing in solving authentic assessments in economics. We found that generative AI tools perform very well at the lower levels of Bloom's taxonomy while still maintaining a decent level of performance at the higher levels, with “create” being the weakest level of performance. Interestingly, these tools are better able to address numeric-based questions than text-based ones. Moreover, all the generative AI tools exhibit weaknesses in building arguments based on theoretical frameworks, maintaining the coherence of different arguments and providing appropriate references. Our study provides educators with a framework to assess the capabilities of generative AI tools, enabling them to make more informed decisions regarding assessments and learning activities. Our findings demand a strategic reimagining of educational goals and assessments, emphasising higher cognitive skills and calling for a concerted effort to enhance the capabilities of educators in preparing students for a rapidly transforming professional environment. Implications for practice or policy Our proposed framework enables educators to systematically evaluate the capabilities of widely used generative AI tools in assessments and assist them in the assessment design process. Tertiary institutions should re-evaluate and redesign programmes and course learning outcomes. The new focus on learning outcomes should address the higher levels of educational goals of Bloom’s taxonomy, specifically the “create” level.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
与机器赛跑评估生成式人工智能解决真实评估问题的能力
在本研究中,我们介绍了一个框架,旨在帮助教育工作者评估流行的生成式人工智能(AI)工具在解决真实评估中的有效性。我们采用布卢姆分类法作为指导原则,创建真实的测评,以评估生成式人工智能工具的能力。我们运用这一框架评估了 ChatGPT-4、ChatGPT-3.5、Google Bard 和 Microsoft Bing 在解决经济学真实评估方面的能力。我们发现,生成式人工智能工具在布卢姆分类法的较低层次上表现非常出色,而在较高层次上仍能保持不错的水平,其中 "创建 "是表现最弱的层次。有趣的是,与基于文本的问题相比,这些工具能更好地处理基于数字的问题。此外,所有生成式人工智能工具在根据理论框架建立论点、保持不同论点的连贯性和提供适当的参考资料方面都表现出弱点。我们的研究为教育工作者提供了一个评估生成式人工智能工具能力的框架,使他们能够就评估和学习活动做出更明智的决定。我们的研究结果要求对教育目标和评估进行战略性的重新构想,强调更高的认知技能,并呼吁共同努力,提高教育工作者的能力,使学生为迅速转变的职业环境做好准备。 对实践或政策的影响我们提出的框架使教育工作者能够系统地评估在评估中广泛使用的生成式人工智能工具的能力,并在评估设计过程中为他们提供帮助。对学习成果的新关注点应涉及布卢姆分类法中更高层次的教育目标,特别是 "创造 "层次。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Australasian Journal of Educational Technology
Australasian Journal of Educational Technology EDUCATION & EDUCATIONAL RESEARCH-
CiteScore
7.60
自引率
7.30%
发文量
54
审稿时长
36 weeks
期刊最新文献
How to sustain a centralised approach to learning design Privacy versus pedagogy – students’ perceptions of using learning analytics in higher education Access and Participation: The use of Technologies as tools for Inclusion by Spanish University Lecturers The effects of visualisation literacy and data storytelling dashboards on teachers’ cognitive load Students got mail: Do students read semi-tailored emails and what is the impact?
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1