{"title":"使用IBM的沃森自动评估学生的简短回答","authors":"Jennifer Campbell, K. Ansell, Timothy J Stelzer","doi":"10.1119/perc.2022.pr.campbell","DOIUrl":null,"url":null,"abstract":"Recent advancements in natural language processing (NLP) have generated interest in using computers to assist in the coding and analysis of students’ short answer responses for PER or classroom applications. We train a state-of-the-art NLP, IBM’s Watson, and test its agreement with humans in three varying experimental cases. By exploring these cases, we begin to understand how Watson behaves with ideal and more realistic data, across different levels of training, and across different types of categorization tasks. We find that Watson’s self-reported confidence for categorizing samples is reasonably well-aligned with its accuracy, although this can be impacted by features of the data being analyzed. Based on these results, we discuss implications and suggest potential applications of this technology to education research.","PeriodicalId":253382,"journal":{"name":"2022 Physics Education Research Conference Proceedings","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Using IBM�s Watson to automatically evaluate student short answer responses\",\"authors\":\"Jennifer Campbell, K. Ansell, Timothy J Stelzer\",\"doi\":\"10.1119/perc.2022.pr.campbell\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent advancements in natural language processing (NLP) have generated interest in using computers to assist in the coding and analysis of students’ short answer responses for PER or classroom applications. We train a state-of-the-art NLP, IBM’s Watson, and test its agreement with humans in three varying experimental cases. By exploring these cases, we begin to understand how Watson behaves with ideal and more realistic data, across different levels of training, and across different types of categorization tasks. We find that Watson’s self-reported confidence for categorizing samples is reasonably well-aligned with its accuracy, although this can be impacted by features of the data being analyzed. Based on these results, we discuss implications and suggest potential applications of this technology to education research.\",\"PeriodicalId\":253382,\"journal\":{\"name\":\"2022 Physics Education Research Conference Proceedings\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 Physics Education Research Conference Proceedings\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1119/perc.2022.pr.campbell\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Physics Education Research Conference Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1119/perc.2022.pr.campbell","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Using IBM�s Watson to automatically evaluate student short answer responses
Recent advancements in natural language processing (NLP) have generated interest in using computers to assist in the coding and analysis of students’ short answer responses for PER or classroom applications. We train a state-of-the-art NLP, IBM’s Watson, and test its agreement with humans in three varying experimental cases. By exploring these cases, we begin to understand how Watson behaves with ideal and more realistic data, across different levels of training, and across different types of categorization tasks. We find that Watson’s self-reported confidence for categorizing samples is reasonably well-aligned with its accuracy, although this can be impacted by features of the data being analyzed. Based on these results, we discuss implications and suggest potential applications of this technology to education research.