首页 > 最新文献

Educational Measurement-Issues and Practice最新文献

英文 中文
Weighing the Value of Complex Growth Estimation Methods to Evaluate Individual Student Response to Instruction 权衡复杂成长评估方法的价值,以评估个别学生对教学的反应
IF 2 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-08-24 DOI: 10.1111/emip.12579
Ethan R. Van Norman

Sophisticated analytic strategies have been proposed as viable methods to improve the quantification of student improvement and to assist educators in making treatment decisions. The performance of three categories of latent growth modeling techniques (linear, quadratic, and dual change) to capture growth in oral reading fluency in response to a 12-week structured supplemental reading intervention among 280 grade three students at-risk for learning disabilities were compared. Although the most complex approach (dual-change) yielded the best model fit indices, there were few practical differences between predicted values from simpler linear models. A discussion to carefully consider the relative benefits and appropriateness of increasingly complex growth modeling strategies to evaluate individual student responses to intervention is offered.

复杂的分析策略已被提出作为可行的方法,以改善学生进步的量化,并协助教育工作者作出治疗决定。对280名有学习障碍风险的三年级学生进行了为期12周的结构化补充阅读干预,比较了三种潜在增长建模技术(线性、二次和双重变化)的表现,以捕捉口语阅读流畅性的增长。虽然最复杂的方法(双重变化)产生了最好的模型拟合指数,但简单线性模型的预测值之间几乎没有实际差异。仔细考虑的相对利益和适当的日益复杂的成长建模策略,以评估个别学生对干预的反应提供了讨论。
{"title":"Weighing the Value of Complex Growth Estimation Methods to Evaluate Individual Student Response to Instruction","authors":"Ethan R. Van Norman","doi":"10.1111/emip.12579","DOIUrl":"10.1111/emip.12579","url":null,"abstract":"<p>Sophisticated analytic strategies have been proposed as viable methods to improve the quantification of student improvement and to assist educators in making treatment decisions. The performance of three categories of latent growth modeling techniques (linear, quadratic, and dual change) to capture growth in oral reading fluency in response to a 12-week structured supplemental reading intervention among 280 grade three students at-risk for learning disabilities were compared. Although the most complex approach (dual-change) yielded the best model fit indices, there were few practical differences between predicted values from simpler linear models. A discussion to carefully consider the relative benefits and appropriateness of increasingly complex growth modeling strategies to evaluate individual student responses to intervention is offered.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 4","pages":"33-41"},"PeriodicalIF":2.0,"publicationDate":"2023-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12579","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47957423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Does It Matter How the Rigor of High School Coursework Is Measured? Gaps in Coursework Among Students and Across Grades 如何衡量高中课程的严谨性重要吗?学生之间和年级之间的课程差距
IF 2 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-08-23 DOI: 10.1111/emip.12577
Burhan Ogut, Darrick Yee, Ruhan Circi, Nevin Dizdari

Researchshows that the intensity of high school course-taking is related to postsecondary outcomes. However, there are various approaches to measuring the intensity of students’ course-taking. This study presents new measures of coursework intensity that rely on differing levels of quantity and quality of coursework. We used these new indices to provide a current description of variations in high school course-taking across grades and student subgroups using a nationally representative dataset, the High School Longitudinal Study of 2009. Results showed that for measures emphasizing the quality of coursework the gaps in coursework among underserved students were larger and there was less upward movement in rigor across grades.

研究表明,高中课程的强度与中学后的成绩有关。然而,有各种方法可以衡量学生的课程强度。这项研究提出了新的衡量作业强度的方法,这些方法依赖于不同水平的作业数量和质量。我们使用这些新的指数,使用一个具有全国代表性的数据集,即2009年的高中纵向研究,来提供当前高中课程选择在不同年级和学生亚组之间的变化描述。结果显示,在强调课程质量的措施中,服务不足的学生的课程差距更大,各年级的严谨性上升幅度较小。
{"title":"Does It Matter How the Rigor of High School Coursework Is Measured? Gaps in Coursework Among Students and Across Grades","authors":"Burhan Ogut,&nbsp;Darrick Yee,&nbsp;Ruhan Circi,&nbsp;Nevin Dizdari","doi":"10.1111/emip.12577","DOIUrl":"10.1111/emip.12577","url":null,"abstract":"<p>Research\u0000shows that the intensity of high school course-taking is related to postsecondary outcomes. However, there are various approaches to measuring the intensity of students’ course-taking. This study presents new measures of coursework intensity that rely on differing levels of quantity and quality of coursework. We used these new indices to provide a current description of variations in high school course-taking across grades and student subgroups using a nationally representative dataset, the High School Longitudinal Study of 2009. Results showed that for measures emphasizing the quality of coursework the gaps in coursework among underserved students were larger and there was less upward movement in rigor across grades.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 4","pages":"42-52"},"PeriodicalIF":2.0,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43299184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploration of Latent Structure in Test Revision and Review Log Data 测试校核测井资料中的潜在结构探讨
IF 2 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-08-14 DOI: 10.1111/emip.12576
Susu Zhang, Anqi Li, Shiyu Wang

In computer-based tests allowing revision and reviews, examinees' sequence of visits and answer changes to questions can be recorded. The variable-length revision log data introduce new complexities to the collected data but, at the same time, provide additional information on examinees' test-taking behavior, which can inform test development and instructions. In the current study, we used recently proposed statistical learning methods for sequence data to provide an exploratory analysis of item-level revision and review log data. Based on the revision log data collected from computer-based classroom assessments, common prototypes of revisit and review behavior were identified. The relationship between revision behavior and various item, test, and individual covariates was further explored under a Bayesian multivariate generalized linear mixed model.

在允许修改和复习的计算机测试中,考生的访问顺序和对问题的回答更改可以被记录下来。可变长度的复习日志数据给收集的数据带来了新的复杂性,但同时也提供了有关考生考试行为的额外信息,可以为考试开发和指导提供信息。在当前的研究中,我们使用了最近提出的序列数据的统计学习方法,对项目级别的修订和回顾日志数据进行了探索性分析。基于从基于计算机的课堂评估中收集的复习日志数据,确定了重访和复习行为的常见原型。在贝叶斯多元广义线性混合模型下,进一步探讨复习行为与各项目、测验和个体协变量之间的关系。
{"title":"Exploration of Latent Structure in Test Revision and Review Log Data","authors":"Susu Zhang,&nbsp;Anqi Li,&nbsp;Shiyu Wang","doi":"10.1111/emip.12576","DOIUrl":"10.1111/emip.12576","url":null,"abstract":"<p>In computer-based tests allowing revision and reviews, examinees' sequence of visits and answer changes to questions can be recorded. The variable-length revision log data introduce new complexities to the collected data but, at the same time, provide additional information on examinees' test-taking behavior, which can inform test development and instructions. In the current study, we used recently proposed statistical learning methods for sequence data to provide an exploratory analysis of item-level revision and review log data. Based on the revision log data collected from computer-based classroom assessments, common prototypes of revisit and review behavior were identified. The relationship between revision behavior and various item, test, and individual covariates was further explored under a Bayesian multivariate generalized linear mixed model.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 4","pages":"53-65"},"PeriodicalIF":2.0,"publicationDate":"2023-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12576","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45210991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Applying a Mixture Rasch Model-Based Approach to Standard Setting 应用混合Rasch模型为基础的方法来制定标准
IF 2 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-07-17 DOI: 10.1111/emip.12571
Michael R. Peabody, Timothy J. Muckle, Yu Meng

The subjective aspect of standard-setting is often criticized, yet data-driven standard-setting methods are rarely applied. Therefore, we applied a mixture Rasch model approach to setting performance standards across several testing programs of various sizes and compared the results to existing passing standards derived from traditional standard-setting methods. We found that heterogeneity of the sample is clearly necessary for the mixture Rasch model approach to standard setting to be useful. While possibly not sufficient to determine passing standards on their own, there may be value in these data-driven models for providing additional validity evidence to support decision-making bodies entrusted with establishing cut scores. They may also provide a useful tool for evaluating existing cut scores and determining if they continue to be supported or if a new study is warranted.

标准制定的主观方面经常受到批评,但数据驱动的标准制定方法很少应用。因此,我们采用混合Rasch模型方法在不同规模的几个测试项目中设定性能标准,并将结果与传统标准制定方法得出的现有合格标准进行比较。我们发现,样品的异质性显然是必要的混合拉希模型方法的标准设置是有用的。虽然这些数据驱动的模型本身可能不足以确定通过标准,但它们可能有价值,可以提供额外的有效性证据,以支持受托建立cut分数的决策机构。他们也可以提供一个有用的工具来评估现有的削减分数,并确定是否继续支持他们,或者是否有必要进行新的研究。
{"title":"Applying a Mixture Rasch Model-Based Approach to Standard Setting","authors":"Michael R. Peabody,&nbsp;Timothy J. Muckle,&nbsp;Yu Meng","doi":"10.1111/emip.12571","DOIUrl":"10.1111/emip.12571","url":null,"abstract":"<p>The subjective aspect of standard-setting is often criticized, yet data-driven standard-setting methods are rarely applied. Therefore, we applied a mixture Rasch model approach to setting performance standards across several testing programs of various sizes and compared the results to existing passing standards derived from traditional standard-setting methods. We found that heterogeneity of the sample is clearly necessary for the mixture Rasch model approach to standard setting to be useful. While possibly not sufficient to determine passing standards on their own, there may be value in these data-driven models for providing additional validity evidence to support decision-making bodies entrusted with establishing cut scores. They may also provide a useful tool for evaluating existing cut scores and determining if they continue to be supported or if a new study is warranted.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 3","pages":"5-12"},"PeriodicalIF":2.0,"publicationDate":"2023-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43146823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Do Subject Matter Experts’ Judgments of Multiple-Choice Format Suitability Predict Item Quality? 主题专家对多选格式适用性的判断能预测项目质量吗?
IF 2 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-07-11 DOI: 10.1111/emip.12570
Rebecca F. Berenbon, Bridget C. McHugh

To assemble a high-quality test, psychometricians rely on subject matter experts (SMEs) to write high-quality items. However, SMEs are not typically given the opportunity to provide input on which content standards are most suitable for multiple-choice questions (MCQs). In the present study, we explored the relationship between perceived MCQ suitability for a given content standard and the associated item characteristics. Prior to item writing, we surveyed SMEs on MCQ suitability for each content standard. Following field testing, we then used SMEs’ average ratings for each content standard to predict item characteristics for the tests. We analyzed multilevel models predicting item difficulty (p value), discrimination, and nonfunctioning distractor presence. Items were nested within courses and content standards. There was a curvilinear relationship between SMEs’ ratings and item difficulty such that very low MCQ suitability ratings were predictive of easier items. After controlling for item difficulty, items with higher MCQ suitability ratings had higher discrimination and were less likely to have one or more nonfunctioning distractors. This research has practical implications for optimizing test blueprints. Additionally, psychometricians may use these ratings to better prepare for coaching SMEs during item writing.

为了编写高质量的测试,心理测量学家依靠主题专家(sme)来编写高质量的项目。然而,中小企业通常没有机会就最适合选择题的内容标准提供意见。在本研究中,我们探讨了感知MCQ对给定内容标准的适应性与相关项目特征之间的关系。在撰写项目之前,我们调查了中小企业对每个内容标准的MCQ适用性。在现场测试之后,我们使用中小企业对每个内容标准的平均评分来预测测试的项目特征。我们分析了预测项目难度(p值)、歧视和无功能干扰物存在的多级模型。项被嵌套在课程和内容标准中。中小企业的评级与项目难度之间存在曲线关系,因此非常低的MCQ适宜性评级可以预测较容易的项目。在控制了项目难度后,MCQ适宜性评级较高的项目具有更高的歧视,并且不太可能有一个或多个不起作用的干扰物。本研究对优化测试蓝图具有实际意义。此外,心理测量学家可以使用这些评分来更好地准备在项目写作期间指导中小企业。
{"title":"Do Subject Matter Experts’ Judgments of Multiple-Choice Format Suitability Predict Item Quality?","authors":"Rebecca F. Berenbon,&nbsp;Bridget C. McHugh","doi":"10.1111/emip.12570","DOIUrl":"10.1111/emip.12570","url":null,"abstract":"<p>To assemble a high-quality test, psychometricians rely on subject matter experts (SMEs) to write high-quality items. However, SMEs are not typically given the opportunity to provide input on which content standards are most suitable for multiple-choice questions (MCQs). In the present study, we explored the relationship between perceived MCQ suitability for a given content standard and the associated item characteristics. Prior to item writing, we surveyed SMEs on MCQ suitability for each content standard. Following field testing, we then used SMEs’ average ratings for each content standard to predict item characteristics for the tests. We analyzed multilevel models predicting item difficulty (<i>p</i> value), discrimination, and nonfunctioning distractor presence. Items were nested within courses and content standards. There was a curvilinear relationship between SMEs’ ratings and item difficulty such that very low MCQ suitability ratings were predictive of easier items. After controlling for item difficulty, items with higher MCQ suitability ratings had higher discrimination and were less likely to have one or more nonfunctioning distractors. This research has practical implications for optimizing test blueprints. Additionally, psychometricians may use these ratings to better prepare for coaching SMEs during item writing.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 3","pages":"13-21"},"PeriodicalIF":2.0,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12570","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46840085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Defining Test-Score Interpretation, Use, and Claims: Delphi Study for the Validity Argument 定义考试成绩的解释、使用和主张:有效性论证的德尔菲研究
IF 2 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-06-27 DOI: 10.1111/emip.12569
Timothy D. Folger, Jonathan Bostic, Erin E. Krupa

Validity is a fundamental consideration of test development and test evaluation. The purpose of this study is to define and reify three key aspects of validity and validation, namely test-score interpretation, test-score use, and the claims supporting interpretation and use. This study employed a Delphi methodology to explore how experts in validity and validation conceptualize test-score interpretation, use, and claims. Definitions were developed through multiple iterations of data collection and analysis. By clarifying the language used when conducting validation, validation may be more accessible to a broader audience, including but not limited to test developers, test users, and test consumers.

有效性是测试开发和测试评估的基本考虑因素。本研究的目的是定义和明确效度和效度的三个关键方面,即考试成绩解释、考试成绩使用和支持解释和使用的主张。本研究采用德尔菲法探讨专家如何在效度和验证概念化考试成绩的解释,使用和主张。定义是通过数据收集和分析的多次迭代制定的。通过澄清在进行验证时使用的语言,可以使更广泛的受众更容易接受验证,包括但不限于测试开发人员、测试用户和测试消费者。
{"title":"Defining Test-Score Interpretation, Use, and Claims: Delphi Study for the Validity Argument","authors":"Timothy D. Folger,&nbsp;Jonathan Bostic,&nbsp;Erin E. Krupa","doi":"10.1111/emip.12569","DOIUrl":"10.1111/emip.12569","url":null,"abstract":"<p>Validity is a fundamental consideration of test development and test evaluation. The purpose of this study is to define and reify three key aspects of validity and validation, namely test-score interpretation, test-score use, and the claims supporting interpretation and use. This study employed a Delphi methodology to explore how experts in validity and validation conceptualize test-score interpretation, use, and claims. Definitions were developed through multiple iterations of data collection and analysis. By clarifying the language used when conducting validation, validation may be more accessible to a broader audience, including but not limited to test developers, test users, and test consumers.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 3","pages":"22-38"},"PeriodicalIF":2.0,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12569","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44066378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Hierarchical Agglomerative Clustering to Detect Test Collusion on Computer-Based Tests 基于层次聚集聚类的计算机测试共谋检测
IF 2 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-06-19 DOI: 10.1111/emip.12568
Soo Jeong Ingrisone, James N. Ingrisone

There has been a growing interest in approaches based on machine learning (ML) for detecting test collusion as an alternative to the traditional methods. Clustering analysis under an unsupervised learning technique appears especially promising to detect group collusion. In this study, the effectiveness of hierarchical agglomerative clustering (HAC) for detecting aberrant test takers on Computer-Based Testing (CBT) is explored. Random forest ensembles are used to evaluate the accuracy of the clustering and find the important features to classify the aberrant test takers. Testing data from a certification exam is used. The level of overlap between the exact response matches on incorrectly keyed items in the exam preparation material and HAC are compared. Integrating HAC as an investigation mean is promising in this field to improve the accuracy of classification of aberrant test takers.

人们越来越关注基于机器学习(ML)的方法来检测测试合谋,作为传统方法的替代方案。在无监督学习技术下的聚类分析在检测群体合谋方面显得特别有前景。本研究探讨了层次凝聚聚类(HAC)在计算机测试(CBT)中检测异常考生的有效性。使用随机森林集合来评估聚类的准确性,并找到对异常考生进行分类的重要特征。使用来自认证考试的测试数据。在考试准备材料和HAC中错误的关键问题的准确回答匹配之间的重叠程度进行比较。将HAC作为一种调查手段,在提高异常考生分类的准确性方面具有广阔的应用前景。
{"title":"Hierarchical Agglomerative Clustering to Detect Test Collusion on Computer-Based Tests","authors":"Soo Jeong Ingrisone,&nbsp;James N. Ingrisone","doi":"10.1111/emip.12568","DOIUrl":"10.1111/emip.12568","url":null,"abstract":"<p>There has been a growing interest in approaches based on machine learning (ML) for detecting test collusion as an alternative to the traditional methods. Clustering analysis under an unsupervised learning technique appears especially promising to detect group collusion. In this study, the effectiveness of hierarchical agglomerative clustering (HAC) for detecting aberrant test takers on Computer-Based Testing (CBT) is explored. Random forest ensembles are used to evaluate the accuracy of the clustering and find the important features to classify the aberrant test takers. Testing data from a certification exam is used. The level of overlap between the exact response matches on incorrectly keyed items in the exam preparation material and HAC are compared. Integrating HAC as an investigation mean is promising in this field to improve the accuracy of classification of aberrant test takers.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 3","pages":"39-49"},"PeriodicalIF":2.0,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49103459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Probabilistic Filtering Approach to Non-Effortful Responding 一种非费力响应的概率过滤方法
IF 2 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-06-16 DOI: 10.1111/emip.12567
Esther Ulitzsch, Benjamin W. Domingue, Radhika Kapoor, Klint Kanopka, Joseph A. Rios

Common response-time-based approaches for non-effortful response behavior (NRB) in educational achievement tests filter responses that are associated with response times below some threshold. These approaches are, however, limited in that they require a binary decision on whether a response is classified as stemming from NRB; thus ignoring potential classification uncertainty in resulting parameter estimates. We developed a response-time-based probabilistic filtering procedure that overcomes this limitation. The procedure is rooted in the principles of multiple imputation. Instead of creating multiple plausible replacements of missing data, however, multiple data sets are created that represent plausible filtered response data. We propose two different approaches to filtering models, originating in different research traditions and conceptualizations of response-time-based identification of NRB. The first approach uses Gaussian mixture modeling to identify a response time subcomponent stemming from NRB. Plausible filtered data sets are created based on examinees' posterior probabilities of belonging to the NRB subcomponent. The second approach defines a plausible range of response time thresholds and creates plausible filtered data sets by drawing multiple response time thresholds from the defined range. We illustrate the workings of the proposed procedure as well as differences between the proposed filtering models based on both simulated data and empirical data from PISA 2018.

在教育成就测试中,常见的基于响应时间的非努力响应行为(NRB)方法会过滤与响应时间低于某个阈值相关的响应。然而,这些方法的局限性在于,它们需要对反应是否归类为NRB的二元决策;从而忽略了结果参数估计中潜在的分类不确定性。我们开发了一种基于响应时间的概率过滤程序来克服这一限制。该程序植根于多重归算的原则。但是,不是为缺失的数据创建多个合理的替换,而是创建多个数据集来表示合理的过滤响应数据。我们提出了两种不同的过滤模型方法,源自不同的研究传统和基于响应时间的NRB识别概念。第一种方法使用高斯混合建模来识别源于NRB的响应时间子分量。可信的过滤数据集是基于考生属于NRB子成分的后验概率创建的。第二种方法定义响应时间阈值的合理范围,并通过从所定义的范围中绘制多个响应时间阈值来创建合理的过滤数据集。我们根据2018年PISA的模拟数据和经验数据说明了拟议程序的工作原理以及拟议过滤模型之间的差异。
{"title":"A Probabilistic Filtering Approach to Non-Effortful Responding","authors":"Esther Ulitzsch,&nbsp;Benjamin W. Domingue,&nbsp;Radhika Kapoor,&nbsp;Klint Kanopka,&nbsp;Joseph A. Rios","doi":"10.1111/emip.12567","DOIUrl":"10.1111/emip.12567","url":null,"abstract":"<p>Common response-time-based approaches for non-effortful response behavior (NRB) in educational achievement tests filter responses that are associated with response times below some threshold. These approaches are, however, limited in that they require a binary decision on whether a response is classified as stemming from NRB; thus ignoring potential classification uncertainty in resulting parameter estimates. We developed a response-time-based probabilistic filtering procedure that overcomes this limitation. The procedure is rooted in the principles of multiple imputation. Instead of creating multiple plausible replacements of missing data, however, multiple data sets are created that represent plausible filtered response data. We propose two different approaches to filtering models, originating in different research traditions and conceptualizations of response-time-based identification of NRB. The first approach uses Gaussian mixture modeling to identify a response time subcomponent stemming from NRB. Plausible filtered data sets are created based on examinees' posterior probabilities of belonging to the NRB subcomponent. The second approach defines a plausible range of response time thresholds and creates plausible filtered data sets by drawing multiple response time thresholds from the defined range. We illustrate the workings of the proposed procedure as well as differences between the proposed filtering models based on both simulated data and empirical data from PISA 2018.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 3","pages":"50-64"},"PeriodicalIF":2.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12567","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46209020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Issue Cover 发行封面
IF 2 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-06-09 DOI: 10.1111/emip.12514
{"title":"Issue Cover","authors":"","doi":"10.1111/emip.12514","DOIUrl":"https://doi.org/10.1111/emip.12514","url":null,"abstract":"","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12514","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50143334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Digital Module 32: Understanding and Mitigating the Impact of Low Effort on Common Uses of Test and Survey Scores 数字模块32:理解和减轻低努力对测试和调查分数常用的影响
IF 2 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-06-09 DOI: 10.1111/emip.12555
James Soland

Most individuals who take, interpret, design, or score tests are aware that examinees do not always provide full effort when responding to items. However, many such individuals are not aware of how pervasive the issue is, what its consequences are, and how to address it. In this digital ITEMS module, Dr. James Soland will help fill these gaps in the knowledge base. Specifically, the module enumerates how frequently behaviors associated with low effort occur, and some of the ways they can distort inferences based on test scores. Then, the module explains some of the most common approaches for identifying low effort, and correcting for it when examining test scores. Brief discussion is also given to how these methods align with, and diverge from, those used to deal with low respondent effort in self-report contexts. Data and code are also provided such that readers can better implement some of the desired methods in their own work.

大多数参加、解释、设计或评分考试的人都意识到,考生在回答问题时并不总是全力以赴。然而,许多这样的人并没有意识到这个问题有多普遍,它的后果是什么,以及如何解决它。在这个数字项目模块中,James Soland博士将帮助填补知识库中的这些空白。具体来说,该模块列举了与低努力相关的行为发生的频率,以及它们可能歪曲基于考试成绩的推断的一些方式。然后,该模块解释了一些最常见的识别低努力的方法,并在检查考试成绩时对其进行纠正。还简要讨论了这些方法如何与那些用于处理自我报告背景下低应答者努力的方法保持一致,并与之不同。还提供了数据和代码,以便读者可以在自己的工作中更好地实现一些所需的方法。
{"title":"Digital Module 32: Understanding and Mitigating the Impact of Low Effort on Common Uses of Test and Survey Scores","authors":"James Soland","doi":"10.1111/emip.12555","DOIUrl":"10.1111/emip.12555","url":null,"abstract":"<p>Most individuals who take, interpret, design, or score tests are aware that examinees do not always provide full effort when responding to items. However, many such individuals are not aware of how pervasive the issue is, what its consequences are, and how to address it. In this digital ITEMS module, Dr. James Soland will help fill these gaps in the knowledge base. Specifically, the module enumerates how frequently behaviors associated with low effort occur, and some of the ways they can distort inferences based on test scores. Then, the module explains some of the most common approaches for identifying low effort, and correcting for it when examining test scores. Brief discussion is also given to how these methods align with, and diverge from, those used to deal with low respondent effort in self-report contexts. Data and code are also provided such that readers can better implement some of the desired methods in their own work.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 2","pages":"75-76"},"PeriodicalIF":2.0,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45513786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Educational Measurement-Issues and Practice
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1