利用与内容无关的特征预测简短回复评分:分层建模方法

Aubrey Condor
{"title":"利用与内容无关的特征预测简短回复评分:分层建模方法","authors":"Aubrey Condor","doi":"arxiv-2405.08574","DOIUrl":null,"url":null,"abstract":"We explore whether the human ratings of open ended responses can be explained\nwith non-content related features, and if such effects vary across different\nmathematics-related items. When scoring is rigorously defined and rooted in a\nmeasurement framework, educators intend that the features of a response which\nare indicative of the respondent's level of ability are contributing to scores.\nHowever, we find that features such as response length, a grammar score of the\nresponse, and a metric relating to key phrase frequency are significant\npredictors for response ratings. Although our findings are not causally\nconclusive, they may propel us to be more critical of he way in which we assess\nopen ended responses, especially in high stakes scenarios. Educators take great\ncare to provide unbiased, consistent ratings, but it may be that extraneous\nfeatures unrelated to those which were intended to be rated are being\nevaluated.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Predicting Short Response Ratings with Non-Content Related Features: A Hierarchical Modeling Approach\",\"authors\":\"Aubrey Condor\",\"doi\":\"arxiv-2405.08574\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We explore whether the human ratings of open ended responses can be explained\\nwith non-content related features, and if such effects vary across different\\nmathematics-related items. When scoring is rigorously defined and rooted in a\\nmeasurement framework, educators intend that the features of a response which\\nare indicative of the respondent's level of ability are contributing to scores.\\nHowever, we find that features such as response length, a grammar score of the\\nresponse, and a metric relating to key phrase frequency are significant\\npredictors for response ratings. Although our findings are not causally\\nconclusive, they may propel us to be more critical of he way in which we assess\\nopen ended responses, especially in high stakes scenarios. Educators take great\\ncare to provide unbiased, consistent ratings, but it may be that extraneous\\nfeatures unrelated to those which were intended to be rated are being\\nevaluated.\",\"PeriodicalId\":501323,\"journal\":{\"name\":\"arXiv - STAT - Other Statistics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Other Statistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2405.08574\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Other Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.08574","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

我们探讨了人类对开放式作答的评分是否可以用与内容无关的特征来解释,以及这种影响在不同的数学相关项目中是否会有所不同。当评分被严格定义并植根于一个测量框架时,教育者希望能反映答题者能力水平的答题特征能对评分做出贡献。然而,我们发现,答题长度、答题语法得分以及与关键短语频率相关的指标等特征是答题评分的重要预测因素。尽管我们的研究结果并不具有因果关系,但它们可能会促使我们对评估开放式回答的方式更加挑剔,尤其是在高风险的情况下。教育工作者会非常谨慎地提供公正、一致的评分,但也有可能是那些与评分目的无关的无关特征被评估了。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Predicting Short Response Ratings with Non-Content Related Features: A Hierarchical Modeling Approach
We explore whether the human ratings of open ended responses can be explained with non-content related features, and if such effects vary across different mathematics-related items. When scoring is rigorously defined and rooted in a measurement framework, educators intend that the features of a response which are indicative of the respondent's level of ability are contributing to scores. However, we find that features such as response length, a grammar score of the response, and a metric relating to key phrase frequency are significant predictors for response ratings. Although our findings are not causally conclusive, they may propel us to be more critical of he way in which we assess open ended responses, especially in high stakes scenarios. Educators take great care to provide unbiased, consistent ratings, but it may be that extraneous features unrelated to those which were intended to be rated are being evaluated.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Censored Data Forecasting: Applying Tobit Exponential Smoothing with Time Aggregation How to survive the Squid Games using probability theory Cross-sectional personal network analysis of adult smoking in rural areas Modeling information spread across networks with communities using a multitype branching process framework Asymptotic confidence intervals for the difference and the ratio of the weighted kappa coefficients of two diagnostic tests subject to a paired design
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1