Evaluate Chat‐GPT's programming capability in Swift through real university exam questions

Software: Practice and Experience Pub Date : 2024-03-21 DOI:10.1002/spe.3330

Zizhuo Zhang, Lian Wen, Yanfei Jiang, Yongli Liu

引用次数: 0

Abstract

In this study, we evaluate the programming capabilities of OpenAI's GPT‐3.5 and GPT‐4 models using Swift‐based exam questions from a third‐year university course. The results indicate that both GPT models generally outperform the average student score, yet they do not consistently exceed the performance of the top students. This comparison highlights areas where the GPT models excel and where they fall short, providing a nuanced view of their current programming proficiency. The study also reveals surprising instances where GPT‐3.5 outperforms GPT‐4, suggesting complex variations in AI model capabilities. By providing a clear benchmark of GPT's programming skills in an academic context, our research contributes valuable insights for future advancements in AI programming education and underscores the need for continued development to fully realize AI's potential in educational settings.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过大学真题评估 Chat-GPT 的 Swift 编程能力

在本研究中，我们使用大学三年级课程中基于 Swift 的试题，对 OpenAI 的 GPT-3.5 和 GPT-4 模型的编程能力进行了评估。结果表明，这两种 GPT 模型的成绩普遍高于学生的平均成绩，但它们并没有持续超过优秀学生的成绩。这种比较凸显了 GPT 模型的优势领域和不足之处，提供了对其当前编程能力的细微观察。研究还揭示了 GPT-3.5 优于 GPT-4 的惊人情况，这表明人工智能模型能力存在复杂的差异。我们的研究为 GPT 在学术背景下的编程技能提供了一个清晰的基准，为人工智能编程教育的未来发展提供了宝贵的见解，并强调了在教育环境中充分发挥人工智能潜力的持续发展的必要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Software: Practice and Experience

自引率

0.00%

发文量