利用与内容无关的特征预测简短回复评分：分层建模方法

arXiv - STAT - Other Statistics Pub Date : 2024-05-14 DOI:arxiv-2405.08574

Aubrey Condor

{"title":"利用与内容无关的特征预测简短回复评分：分层建模方法","authors":"Aubrey Condor","doi":"arxiv-2405.08574","DOIUrl":null,"url":null,"abstract":"We explore whether the human ratings of open ended responses can be explained\nwith non-content related features, and if such effects vary across different\nmathematics-related items. When scoring is rigorously defined and rooted in a\nmeasurement framework, educators intend that the features of a response which\nare indicative of the respondent's level of ability are contributing to scores.\nHowever, we find that features such as response length, a grammar score of the\nresponse, and a metric relating to key phrase frequency are significant\npredictors for response ratings. Although our findings are not causally\nconclusive, they may propel us to be more critical of he way in which we assess\nopen ended responses, especially in high stakes scenarios. Educators take great\ncare to provide unbiased, consistent ratings, but it may be that extraneous\nfeatures unrelated to those which were intended to be rated are being\nevaluated.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Predicting Short Response Ratings with Non-Content Related Features: A Hierarchical Modeling Approach\",\"authors\":\"Aubrey Condor\",\"doi\":\"arxiv-2405.08574\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We explore whether the human ratings of open ended responses can be explained\\nwith non-content related features, and if such effects vary across different\\nmathematics-related items. When scoring is rigorously defined and rooted in a\\nmeasurement framework, educators intend that the features of a response which\\nare indicative of the respondent's level of ability are contributing to scores.\\nHowever, we find that features such as response length, a grammar score of the\\nresponse, and a metric relating to key phrase frequency are significant\\npredictors for response ratings. Although our findings are not causally\\nconclusive, they may propel us to be more critical of he way in which we assess\\nopen ended responses, especially in high stakes scenarios. Educators take great\\ncare to provide unbiased, consistent ratings, but it may be that extraneous\\nfeatures unrelated to those which were intended to be rated are being\\nevaluated.\",\"PeriodicalId\":501323,\"journal\":{\"name\":\"arXiv - STAT - Other Statistics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Other Statistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2405.08574\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Other Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.08574","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

我们探讨了人类对开放式作答的评分是否可以用与内容无关的特征来解释，以及这种影响在不同的数学相关项目中是否会有所不同。当评分被严格定义并植根于一个测量框架时，教育者希望能反映答题者能力水平的答题特征能对评分做出贡献。然而，我们发现，答题长度、答题语法得分以及与关键短语频率相关的指标等特征是答题评分的重要预测因素。尽管我们的研究结果并不具有因果关系，但它们可能会促使我们对评估开放式回答的方式更加挑剔，尤其是在高风险的情况下。教育工作者会非常谨慎地提供公正、一致的评分，但也有可能是那些与评分目的无关的无关特征被评估了。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Predicting Short Response Ratings with Non-Content Related Features: A Hierarchical Modeling Approach

We explore whether the human ratings of open ended responses can be explained with non-content related features, and if such effects vary across different mathematics-related items. When scoring is rigorously defined and rooted in a measurement framework, educators intend that the features of a response which are indicative of the respondent's level of ability are contributing to scores. However, we find that features such as response length, a grammar score of the response, and a metric relating to key phrase frequency are significant predictors for response ratings. Although our findings are not causally conclusive, they may propel us to be more critical of he way in which we assess open ended responses, especially in high stakes scenarios. Educators take great care to provide unbiased, consistent ratings, but it may be that extraneous features unrelated to those which were intended to be rated are being evaluated.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - STAT - Other Statistics

自引率

0.00%

发文量