Evaluating IBM’s Watson natural language processing artificial intelligence as a short-answer categorization tool for physics education research

IF 2.6 2区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Physical Review Physics Education Research Pub Date : 2024-03-22 DOI:10.1103/physrevphyseducres.20.010116
Jennifer Campbell, Katie Ansell, Tim Stelzer
{"title":"Evaluating IBM’s Watson natural language processing artificial intelligence as a short-answer categorization tool for physics education research","authors":"Jennifer Campbell, Katie Ansell, Tim Stelzer","doi":"10.1103/physrevphyseducres.20.010116","DOIUrl":null,"url":null,"abstract":"Recent advances in publicly available natural language processors (NLP) may enhance the efficiency of analyzing student short-answer responses in physics education research (PER). We train a state-of-the-art NLP, IBM’s Watson, and test its agreement with human coders using two different studies that gathered text responses in which students explain their reasoning on physics-related questions. The first study analyzes 479 student responses to a lab data analysis question and categorizes them by main idea. The second study analyzes 732 student answers to identify the presence or absence of each of the two conceptual themes. When training Watson with approximately one-third to half of the samples, we find that samples labeled with high confidence scores have similar accuracy to human agreement; yet for lower confidence scores, humans outperform the NLP’s labeling accuracy. In addition to studying Watson’s overall accuracy, we use this analysis to better understand factors that impact how Watson categorizes. Using the data from the categorization study, we find that Watson’s algorithm does not appear to be impacted by the disproportionate representation of categories in the training set, and we examine mislabeled statements to identify vocabulary and phrasing that may increase the rate of false positives. Based on this work, we find that, with careful consideration of the research study design and an awareness of the NLP’s limitations, Watson may present a useful tool for large-scale PER studies or classroom analysis tools.","PeriodicalId":54296,"journal":{"name":"Physical Review Physics Education Research","volume":null,"pages":null},"PeriodicalIF":2.6000,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physical Review Physics Education Research","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1103/physrevphyseducres.20.010116","RegionNum":2,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 0

Abstract

Recent advances in publicly available natural language processors (NLP) may enhance the efficiency of analyzing student short-answer responses in physics education research (PER). We train a state-of-the-art NLP, IBM’s Watson, and test its agreement with human coders using two different studies that gathered text responses in which students explain their reasoning on physics-related questions. The first study analyzes 479 student responses to a lab data analysis question and categorizes them by main idea. The second study analyzes 732 student answers to identify the presence or absence of each of the two conceptual themes. When training Watson with approximately one-third to half of the samples, we find that samples labeled with high confidence scores have similar accuracy to human agreement; yet for lower confidence scores, humans outperform the NLP’s labeling accuracy. In addition to studying Watson’s overall accuracy, we use this analysis to better understand factors that impact how Watson categorizes. Using the data from the categorization study, we find that Watson’s algorithm does not appear to be impacted by the disproportionate representation of categories in the training set, and we examine mislabeled statements to identify vocabulary and phrasing that may increase the rate of false positives. Based on this work, we find that, with careful consideration of the research study design and an awareness of the NLP’s limitations, Watson may present a useful tool for large-scale PER studies or classroom analysis tools.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
将 IBM 的 Watson 自然语言处理人工智能作为物理教育研究的简答题分类工具进行评估
在物理教育研究(PER)中,公开可用的自然语言处理器(NLP)的最新进展可能会提高分析学生简答回答的效率。我们训练了最先进的 NLP(IBM 的 Watson),并使用两项不同的研究测试了它与人类编码员的一致性,这两项研究收集了学生解释他们对物理相关问题的推理的文本回答。第一项研究分析了 479 份学生对实验数据分析问题的回答,并按主旨进行了分类。第二项研究分析了 732 个学生的回答,以确定是否存在两个概念主题。当用大约三分之一到一半的样本对沃森进行训练时,我们发现用高置信度分数标记的样本与人类一致的准确性相似;然而对于较低的置信度分数,人类的标记准确性要优于 NLP。除了研究沃森的整体准确性,我们还利用这一分析来更好地了解影响沃森分类的因素。利用分类研究中的数据,我们发现沃森的算法似乎并没有受到训练集中类别比例失调的影响,我们还检查了错误标记的语句,以确定可能会增加误判率的词汇和措辞。在这项工作的基础上,我们发现,如果仔细考虑研究设计并意识到 NLP 的局限性,Watson 可能会成为大规模 PER 研究或课堂分析工具的有用工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Physical Review Physics Education Research
Physical Review Physics Education Research Social Sciences-Education
CiteScore
5.70
自引率
41.90%
发文量
84
审稿时长
32 weeks
期刊介绍: PRPER covers all educational levels, from elementary through graduate education. All topics in experimental and theoretical physics education research are accepted, including, but not limited to: Educational policy Instructional strategies, and materials development Research methodology Epistemology, attitudes, and beliefs Learning environment Scientific reasoning and problem solving Diversity and inclusion Learning theory Student participation Faculty and teacher professional development
期刊最新文献
Erratum: Development and validation of a conceptual multiple-choice survey instrument to assess student understanding of introductory thermodynamics [Phys. Rev. Phys. Educ. Res. 19, 020112 (2023)] Reinforcing mindware or supporting cognitive reflection: Testing two strategies for addressing a persistent learning challenge in the context of air resistance How women and lesbian, gay, bisexual, transgender, and queer physics doctoral students navigate graduate education: The roles of professional environments and social networks Evolving study strategies and support structures of introductory physics students Effectiveness of conceptual-framework-based instruction on promoting knowledge integration in learning simple electric circuit
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1