Chatbot responses suggest that hypothetical biology questions are harder than realistic ones

IF 1.5 Q2 EDUCATION, SCIENTIFIC DISCIPLINES Journal of Microbiology & Biology Education Pub Date : 2023-11-07 DOI:10.1128/jmbe.00153-23

Gregory J. Crowther, Usha Sankar, Leena S. Knight, Deborah L. Myers, Kevin T. Patton, Lekelia D. Jenkins, Thomas A. Knight

{"title":"Chatbot responses suggest that hypothetical biology questions are harder than realistic ones","authors":"Gregory J. Crowther, Usha Sankar, Leena S. Knight, Deborah L. Myers, Kevin T. Patton, Lekelia D. Jenkins, Thomas A. Knight","doi":"10.1128/jmbe.00153-23","DOIUrl":null,"url":null,"abstract":"ABSTRACT The biology education literature includes compelling assertions that unfamiliar problems are especially useful for revealing students’ true understanding of biology. However, there is only limited evidence that such novel problems have different cognitive requirements than more familiar problems. Here, we sought additional evidence by using chatbots based on large language models as models of biology students. For human physiology and cell biology, we developed sets of realistic and hypothetical problems matched to the same lesson learning objectives (LLOs). Problems were considered hypothetical if (i) known biological entities (molecules and organs) were given atypical or counterfactual properties (redefinition) or (ii) fictitious biological entities were introduced (invention). Several chatbots scored significantly worse on hypothetical problems than on realistic problems, with scores declining by an average of 13%. Among hypothetical questions, redefinition questions appeared especially difficult, with many chatbots scoring as if guessing randomly. These results suggest that, for a given LLO, hypothetical problems may have different cognitive demands than realistic problems and may more accurately reveal students’ ability to apply biology core concepts to diverse contexts. The Test Question Templates (TQT) framework, which explicitly connects LLOs with examples of assessment questions, can help educators generate problems that are challenging (due to their novelty), yet fair (due to their alignment with pre-specified LLOs). Finally, ChatGPT’s rapid improvement toward expert-level answers suggests that future educators cannot reasonably expect to ignore or outwit chatbots but must do what we can to make assessments fair and equitable.","PeriodicalId":46416,"journal":{"name":"Journal of Microbiology & Biology Education","volume":"1 3","pages":"0"},"PeriodicalIF":1.5000,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Microbiology & Biology Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1128/jmbe.00153-23","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}

引用次数: 0

Abstract

ABSTRACT The biology education literature includes compelling assertions that unfamiliar problems are especially useful for revealing students’ true understanding of biology. However, there is only limited evidence that such novel problems have different cognitive requirements than more familiar problems. Here, we sought additional evidence by using chatbots based on large language models as models of biology students. For human physiology and cell biology, we developed sets of realistic and hypothetical problems matched to the same lesson learning objectives (LLOs). Problems were considered hypothetical if (i) known biological entities (molecules and organs) were given atypical or counterfactual properties (redefinition) or (ii) fictitious biological entities were introduced (invention). Several chatbots scored significantly worse on hypothetical problems than on realistic problems, with scores declining by an average of 13%. Among hypothetical questions, redefinition questions appeared especially difficult, with many chatbots scoring as if guessing randomly. These results suggest that, for a given LLO, hypothetical problems may have different cognitive demands than realistic problems and may more accurately reveal students’ ability to apply biology core concepts to diverse contexts. The Test Question Templates (TQT) framework, which explicitly connects LLOs with examples of assessment questions, can help educators generate problems that are challenging (due to their novelty), yet fair (due to their alignment with pre-specified LLOs). Finally, ChatGPT’s rapid improvement toward expert-level answers suggests that future educators cannot reasonably expect to ignore or outwit chatbots but must do what we can to make assessments fair and equitable.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

聊天机器人的回答表明，假设的生物学问题比现实问题更难

生物教育文献包括令人信服的断言，即不熟悉的问题对于揭示学生对生物学的真正理解特别有用。然而，只有有限的证据表明这些新问题与更熟悉的问题具有不同的认知要求。在这里，我们通过使用基于大型语言模型的聊天机器人作为生物学学生的模型来寻找额外的证据。对于人类生理学和细胞生物学，我们开发了一系列与相同的课程学习目标(LLOs)相匹配的现实和假设问题。如果(i)已知的生物实体(分子和器官)被赋予非典型或反事实属性(重新定义)或(ii)引入虚构的生物实体(发明)，则认为问题是假设性的。有几个聊天机器人在假设问题上的得分明显低于现实问题，平均得分下降了13%。在假设性问题中，重新定义问题似乎特别困难，许多聊天机器人的得分就像随机猜测一样。这些结果表明，对于给定的LLO，假设问题可能具有不同于现实问题的认知需求，并且可能更准确地揭示学生将生物学核心概念应用于不同情境的能力。测试问题模板(TQT)框架明确地将LLOs与评估问题的示例联系起来，可以帮助教育工作者生成具有挑战性(由于它们的新颖性)但公平(由于它们与预先指定的LLOs一致)的问题。最后，ChatGPT向专家级答案的快速改进表明，未来的教育工作者不能理所当然地期望忽视或智过聊天机器人，而是必须尽我们所能使评估公平公正。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊