将 IBM 的 Watson 自然语言处理人工智能作为物理教育研究的简答题分类工具进行评估

IF 3.6 2区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Physical Review Physics Education Research Pub Date : 2024-03-22 DOI:10.1103/physrevphyseducres.20.010116

Jennifer Campbell, Katie Ansell, Tim Stelzer

{"title":"将 IBM 的 Watson 自然语言处理人工智能作为物理教育研究的简答题分类工具进行评估","authors":"Jennifer Campbell, Katie Ansell, Tim Stelzer","doi":"10.1103/physrevphyseducres.20.010116","DOIUrl":null,"url":null,"abstract":"Recent advances in publicly available natural language processors (NLP) may enhance the efficiency of analyzing student short-answer responses in physics education research (PER). We train a state-of-the-art NLP, IBM’s Watson, and test its agreement with human coders using two different studies that gathered text responses in which students explain their reasoning on physics-related questions. The first study analyzes 479 student responses to a lab data analysis question and categorizes them by main idea. The second study analyzes 732 student answers to identify the presence or absence of each of the two conceptual themes. When training Watson with approximately one-third to half of the samples, we find that samples labeled with high confidence scores have similar accuracy to human agreement; yet for lower confidence scores, humans outperform the NLP’s labeling accuracy. In addition to studying Watson’s overall accuracy, we use this analysis to better understand factors that impact how Watson categorizes. Using the data from the categorization study, we find that Watson’s algorithm does not appear to be impacted by the disproportionate representation of categories in the training set, and we examine mislabeled statements to identify vocabulary and phrasing that may increase the rate of false positives. Based on this work, we find that, with careful consideration of the research study design and an awareness of the NLP’s limitations, Watson may present a useful tool for large-scale PER studies or classroom analysis tools.","PeriodicalId":54296,"journal":{"name":"Physical Review Physics Education Research","volume":"121 1","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating IBM’s Watson natural language processing artificial intelligence as a short-answer categorization tool for physics education research\",\"authors\":\"Jennifer Campbell, Katie Ansell, Tim Stelzer\",\"doi\":\"10.1103/physrevphyseducres.20.010116\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent advances in publicly available natural language processors (NLP) may enhance the efficiency of analyzing student short-answer responses in physics education research (PER). We train a state-of-the-art NLP, IBM’s Watson, and test its agreement with human coders using two different studies that gathered text responses in which students explain their reasoning on physics-related questions. The first study analyzes 479 student responses to a lab data analysis question and categorizes them by main idea. The second study analyzes 732 student answers to identify the presence or absence of each of the two conceptual themes. When training Watson with approximately one-third to half of the samples, we find that samples labeled with high confidence scores have similar accuracy to human agreement; yet for lower confidence scores, humans outperform the NLP’s labeling accuracy. In addition to studying Watson’s overall accuracy, we use this analysis to better understand factors that impact how Watson categorizes. Using the data from the categorization study, we find that Watson’s algorithm does not appear to be impacted by the disproportionate representation of categories in the training set, and we examine mislabeled statements to identify vocabulary and phrasing that may increase the rate of false positives. Based on this work, we find that, with careful consideration of the research study design and an awareness of the NLP’s limitations, Watson may present a useful tool for large-scale PER studies or classroom analysis tools.\",\"PeriodicalId\":54296,\"journal\":{\"name\":\"Physical Review Physics Education Research\",\"volume\":\"121 1\",\"pages\":\"\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2024-03-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Physical Review Physics Education Research\",\"FirstCategoryId\":\"95\",\"ListUrlMain\":\"https://doi.org/10.1103/physrevphyseducres.20.010116\",\"RegionNum\":2,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EDUCATION & EDUCATIONAL RESEARCH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physical Review Physics Education Research","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1103/physrevphyseducres.20.010116","RegionNum":2,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

摘要

在物理教育研究（PER）中，公开可用的自然语言处理器（NLP）的最新进展可能会提高分析学生简答回答的效率。我们训练了最先进的 NLP（IBM 的 Watson），并使用两项不同的研究测试了它与人类编码员的一致性，这两项研究收集了学生解释他们对物理相关问题的推理的文本回答。第一项研究分析了 479 份学生对实验数据分析问题的回答，并按主旨进行了分类。第二项研究分析了 732 个学生的回答，以确定是否存在两个概念主题。当用大约三分之一到一半的样本对沃森进行训练时，我们发现用高置信度分数标记的样本与人类一致的准确性相似；然而对于较低的置信度分数，人类的标记准确性要优于 NLP。除了研究沃森的整体准确性，我们还利用这一分析来更好地了解影响沃森分类的因素。利用分类研究中的数据，我们发现沃森的算法似乎并没有受到训练集中类别比例失调的影响，我们还检查了错误标记的语句，以确定可能会增加误判率的词汇和措辞。在这项工作的基础上，我们发现，如果仔细考虑研究设计并意识到 NLP 的局限性，Watson 可能会成为大规模 PER 研究或课堂分析工具的有用工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Evaluating IBM’s Watson natural language processing artificial intelligence as a short-answer categorization tool for physics education research

Recent advances in publicly available natural language processors (NLP) may enhance the efficiency of analyzing student short-answer responses in physics education research (PER). We train a state-of-the-art NLP, IBM’s Watson, and test its agreement with human coders using two different studies that gathered text responses in which students explain their reasoning on physics-related questions. The first study analyzes 479 student responses to a lab data analysis question and categorizes them by main idea. The second study analyzes 732 student answers to identify the presence or absence of each of the two conceptual themes. When training Watson with approximately one-third to half of the samples, we find that samples labeled with high confidence scores have similar accuracy to human agreement; yet for lower confidence scores, humans outperform the NLP’s labeling accuracy. In addition to studying Watson’s overall accuracy, we use this analysis to better understand factors that impact how Watson categorizes. Using the data from the categorization study, we find that Watson’s algorithm does not appear to be impacted by the disproportionate representation of categories in the training set, and we examine mislabeled statements to identify vocabulary and phrasing that may increase the rate of false positives. Based on this work, we find that, with careful consideration of the research study design and an awareness of the NLP’s limitations, Watson may present a useful tool for large-scale PER studies or classroom analysis tools.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Physical Review Physics Education Research Social Sciences-Education

CiteScore

5.70

自引率

41.90%

发文量

审稿时长

32 weeks

期刊介绍： PRPER covers all educational levels, from elementary through graduate education. All topics in experimental and theoretical physics education research are accepted, including, but not limited to: Educational policy Instructional strategies, and materials development Research methodology Epistemology, attitudes, and beliefs Learning environment Scientific reasoning and problem solving Diversity and inclusion Learning theory Student participation Faculty and teacher professional development