首页 > 最新文献

Applied Measurement in Education最新文献

英文 中文
Detection of Outliers in Anchor Items Using Modified Rasch Fit Statistics 利用改进的Rasch拟合统计量检测锚点项目中的异常值
IF 1.5 4区 教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-10-02 DOI: 10.1080/08957347.2021.1987901
Chunyan Liu, D. Jurich, C. Morrison, Irina Grabovsky
ABSTRACT The existence of outliers in the anchor items can be detrimental to the estimation of examinee ability and undermine the validity of score interpretation across forms. However, in practice, anchor item performance can become distorted due to various reasons. This study compares the performance of modified INFIT and OUTFIT Rasch statistics with the Logit Difference approach with 0.3 and 0.5 as the predetermined cutoff values, and the Robust z statistic with 1.645 and 2.7 as the cutoff values through a simulation study by varying the sample size, proportion of outliers, item difficulty drift direction, and group difference magnitude. The results suggest that both modified INFIT and OUTFIT statistics perform very similarly and outperform the other methods in all aspects, including sensitivity of flagging outliers, specificity of flagging non-outliers, recovery of translation constant, and recovery of examinee ability in all simulated conditions.
锚题中异常值的存在不利于对考生能力的估计,并破坏了跨表格分数解释的有效性。然而,在实际操作中,由于各种原因,锚项目的性能会发生扭曲。本研究通过改变样本量、异常值比例、项目难度漂移方向和组差幅度,比较改进的INFIT和OUTFIT Rasch统计量与Logit差分法(以0.3和0.5为预定截断值)和Robust z统计量(以1.645和2.7为截断值)的性能。结果表明,修正后的INFIT和OUTFIT统计在所有模拟条件下的表现非常相似,并且在标记异常值的敏感性,标记非异常值的特异性,翻译常数的恢复以及考生能力的恢复等方面都优于其他方法。
{"title":"Detection of Outliers in Anchor Items Using Modified Rasch Fit Statistics","authors":"Chunyan Liu, D. Jurich, C. Morrison, Irina Grabovsky","doi":"10.1080/08957347.2021.1987901","DOIUrl":"https://doi.org/10.1080/08957347.2021.1987901","url":null,"abstract":"ABSTRACT The existence of outliers in the anchor items can be detrimental to the estimation of examinee ability and undermine the validity of score interpretation across forms. However, in practice, anchor item performance can become distorted due to various reasons. This study compares the performance of modified INFIT and OUTFIT Rasch statistics with the Logit Difference approach with 0.3 and 0.5 as the predetermined cutoff values, and the Robust z statistic with 1.645 and 2.7 as the cutoff values through a simulation study by varying the sample size, proportion of outliers, item difficulty drift direction, and group difference magnitude. The results suggest that both modified INFIT and OUTFIT statistics perform very similarly and outperform the other methods in all aspects, including sensitivity of flagging outliers, specificity of flagging non-outliers, recovery of translation constant, and recovery of examinee ability in all simulated conditions.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"34 1","pages":"327 - 341"},"PeriodicalIF":1.5,"publicationDate":"2021-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47230980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
The Effect of Peer Assessment on Non-Cognitive Outcomes: A Meta-Analysis 同伴评价对非认知结果的影响:一项元分析
IF 1.5 4区 教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-07-03 DOI: 10.1080/08957347.2021.1933980
Hongli Li, Jacquelyn A. Bialo, Yao Xiong, C. Hunter, Xiuyan Guo
ABSTRACT Peer assessment is increasingly being used as a pedagogical tool in classrooms. Participating in peer assessment may enhance student learning in both cognitive and non-cognitive aspects. In this study, we focused on non-cognitive aspects by performing a meta-analysis to synthesize the effect of peer assessment on students’ non-cognitive learning outcomes. After a systematic search, we included 43 effect sizes from 19 studies, which mostly involved learning strategies and academic mind-sets as non-cognitive outcomes. Using a random effects model, we found that students who had participated in peer assessment showed a 0.289 standard deviation unit improvement in non-cognitive outcomes as compared to students who had not participated in peer assessment. Further, we found that the effect of peer assessment on non-cognitive outcomes was significantly larger when both scores and comments were provided to students or when assessors and assessees were matched at random. Our findings can be used as a basis for further investigation into how best to use peer assessment as a learning tool, especially to promote non-cognitive development.
同行评估越来越多地被用作课堂教学工具。参与同伴评估可以提高学生在认知和非认知方面的学习。在本研究中,我们通过荟萃分析来综合同伴评价对学生非认知学习成果的影响,重点关注非认知方面。经过系统搜索,我们纳入了来自19项研究的43个效应值,这些研究主要涉及学习策略和学术心态作为非认知结果。使用随机效应模型,我们发现参加同伴评估的学生在非认知结果上比没有参加同伴评估的学生有0.289个标准差单位的改善。此外,我们发现同伴评估对非认知结果的影响在分数和评论都提供给学生或当评估者和被评估者随机匹配时显着更大。我们的研究结果可以作为进一步研究如何最好地利用同伴评估作为学习工具的基础,特别是促进非认知发展。
{"title":"The Effect of Peer Assessment on Non-Cognitive Outcomes: A Meta-Analysis","authors":"Hongli Li, Jacquelyn A. Bialo, Yao Xiong, C. Hunter, Xiuyan Guo","doi":"10.1080/08957347.2021.1933980","DOIUrl":"https://doi.org/10.1080/08957347.2021.1933980","url":null,"abstract":"ABSTRACT Peer assessment is increasingly being used as a pedagogical tool in classrooms. Participating in peer assessment may enhance student learning in both cognitive and non-cognitive aspects. In this study, we focused on non-cognitive aspects by performing a meta-analysis to synthesize the effect of peer assessment on students’ non-cognitive learning outcomes. After a systematic search, we included 43 effect sizes from 19 studies, which mostly involved learning strategies and academic mind-sets as non-cognitive outcomes. Using a random effects model, we found that students who had participated in peer assessment showed a 0.289 standard deviation unit improvement in non-cognitive outcomes as compared to students who had not participated in peer assessment. Further, we found that the effect of peer assessment on non-cognitive outcomes was significantly larger when both scores and comments were provided to students or when assessors and assessees were matched at random. Our findings can be used as a basis for further investigation into how best to use peer assessment as a learning tool, especially to promote non-cognitive development.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"34 1","pages":"179 - 203"},"PeriodicalIF":1.5,"publicationDate":"2021-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2021.1933980","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44728944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Bayesian Estimation and Testing of a Linear Logistic Test Model for Learning during the Test 测试中学习的线性逻辑测试模型的贝叶斯估计和检验
IF 1.5 4区 教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-07-03 DOI: 10.1080/08957347.2021.1933982
J. H. Lozano, J. Revuelta
ABSTRACT The present study proposes a Bayesian approach for estimating and testing the operation-specific learning model, a variant of the linear logistic test model that allows for the measurement of the learning that occurs during a test as a result of the repeated use of the operations involved in the items. The advantages of using a Bayesian framework compared to the traditional frequentist approach are discussed. The application of the model is illustrated with real data from a logical ability test. The results show how the incorporation of previous practice into the linear logistic model improves the fit of the model as well as the prediction of the Rasch item difficulty estimates. The model provides evidence of learning associated with two of the logic operations involved in the items, which supports the hypothesis of practice effects in deductive reasoning tasks.
本研究提出了一种贝叶斯方法来估计和测试特定操作的学习模型,这是线性逻辑测试模型的一种变体,允许测量由于重复使用项目中涉及的操作而在测试期间发生的学习。讨论了贝叶斯框架与传统频率分析方法相比的优点。以某逻辑能力测试的实际数据为例说明了该模型的应用。结果表明,将以往的实践纳入线性逻辑模型可以改善模型的拟合以及对Rasch项目难度估计的预测。该模型提供了与项目中涉及的两种逻辑操作相关的学习证据,这支持了演绎推理任务中实践效应的假设。
{"title":"Bayesian Estimation and Testing of a Linear Logistic Test Model for Learning during the Test","authors":"J. H. Lozano, J. Revuelta","doi":"10.1080/08957347.2021.1933982","DOIUrl":"https://doi.org/10.1080/08957347.2021.1933982","url":null,"abstract":"ABSTRACT The present study proposes a Bayesian approach for estimating and testing the operation-specific learning model, a variant of the linear logistic test model that allows for the measurement of the learning that occurs during a test as a result of the repeated use of the operations involved in the items. The advantages of using a Bayesian framework compared to the traditional frequentist approach are discussed. The application of the model is illustrated with real data from a logical ability test. The results show how the incorporation of previous practice into the linear logistic model improves the fit of the model as well as the prediction of the Rasch item difficulty estimates. The model provides evidence of learning associated with two of the logic operations involved in the items, which supports the hypothesis of practice effects in deductive reasoning tasks.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"34 1","pages":"223 - 235"},"PeriodicalIF":1.5,"publicationDate":"2021-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2021.1933982","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45671064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Development and Use of Anchoring Vignettes: Psychometric Investigations and Recommendations for a Nonparametric Approach 锚定警戒线的开发和使用:非参数方法的心理测量调查和建议
IF 1.5 4区 教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-07-03 DOI: 10.1080/08957347.2021.1933983
HyeSun Lee, W. Smith, Angel Martinez, Heather Ferris, Joe Bova
ABSTRACT The aim of the current research was to provide recommendations to facilitate the development and use of anchoring vignettes (AVs) for cross-cultural comparisons in education. Study 1 identified six factors leading to order violations and ties in AV responses based on cognitive interviews with 15-year-old students. The factors were categorized into three domains: varying levels of AV format familiarity, differential interpretations of content, and individual differences in processing information. To inform the most appropriate approach to treat order violations and re-scaling method, Study 2 conducted Monte Carlo simulations with the manipulation of three factors and incorporation of five responses styles. Study 2 found that the AV approach improved accuracy in score estimation by successfully controlling for response styles. The results also revealed the reordering approach to treat order violations produced the most accurate estimation combined with the re-scaling method assigning the middle value among possible scores. Along with strategies to develop AVs, strengths and limitations of the implemented nonparametric AV approach were discussed in comparison to item response theory modeling for response styles.
摘要当前研究的目的是提供建议,以促进锚定小插曲(AV)在教育中的跨文化比较的开发和使用。研究1根据对15岁学生的认知访谈,确定了导致AV反应中违反秩序和联系的六个因素。这些因素被分为三个领域:不同程度的AV格式熟悉度、对内容的不同解释以及处理信息的个体差异。为了告知处理订单违规的最合适方法和重新缩放方法,研究2进行了蒙特卡洛模拟,其中操纵了三个因素,并结合了五种反应风格。研究2发现,AV方法通过成功控制反应风格来提高分数估计的准确性。结果还表明,处理订单违规的重新排序方法与在可能的分数中分配中间值的重新缩放方法相结合,产生了最准确的估计。除了开发AV的策略外,还讨论了所实现的非参数AV方法的优势和局限性,并与反应风格的项目反应理论建模进行了比较。
{"title":"Development and Use of Anchoring Vignettes: Psychometric Investigations and Recommendations for a Nonparametric Approach","authors":"HyeSun Lee, W. Smith, Angel Martinez, Heather Ferris, Joe Bova","doi":"10.1080/08957347.2021.1933983","DOIUrl":"https://doi.org/10.1080/08957347.2021.1933983","url":null,"abstract":"ABSTRACT The aim of the current research was to provide recommendations to facilitate the development and use of anchoring vignettes (AVs) for cross-cultural comparisons in education. Study 1 identified six factors leading to order violations and ties in AV responses based on cognitive interviews with 15-year-old students. The factors were categorized into three domains: varying levels of AV format familiarity, differential interpretations of content, and individual differences in processing information. To inform the most appropriate approach to treat order violations and re-scaling method, Study 2 conducted Monte Carlo simulations with the manipulation of three factors and incorporation of five responses styles. Study 2 found that the AV approach improved accuracy in score estimation by successfully controlling for response styles. The results also revealed the reordering approach to treat order violations produced the most accurate estimation combined with the re-scaling method assigning the middle value among possible scores. Along with strategies to develop AVs, strengths and limitations of the implemented nonparametric AV approach were discussed in comparison to item response theory modeling for response styles.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"34 1","pages":"204 - 222"},"PeriodicalIF":1.5,"publicationDate":"2021-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2021.1933983","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43182394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Estimation of Mean Ability Growth Using Vertical Scaling 利用垂直标度有效估计平均能力增长
IF 1.5 4区 教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-06-15 DOI: 10.1080/08957347.2021.1933981
Jonas Bjermo, Frank Miller
ABSTRACT In recent years, the interest in measuring growth in student ability in various subjects between different grades in school has increased. Therefore, good precision in the estimated growth is of importance. This paper aims to compare estimation methods and test designs when it comes to precision and bias of the estimated growth of mean ability between two groups of students that differ substantially. This is performed by a simulation study. One- and two-parameter item response models are assumed and the estimated abilities are vertically scaled using the non-equivalent anchor test design by estimating the abilities in one single run, so-called concurrent calibration. The connection between the test design and the Fisher information is also discussed. The results indicate that the expected a posteriori estimation method is preferred when estimating differences in mean ability between groups. Results also indicate that a test design with common items of medium difficulty leads to better precision, which coincides with previous results from horizontal equating.
摘要近年来,人们对衡量不同年级学生在各个科目上的能力增长越来越感兴趣。因此,准确估计增长是非常重要的。本文旨在比较两组学生平均能力增长的估计精度和偏差的估计方法和测试设计,这两组学生的平均能力增长差异很大。这是通过模拟研究进行的。假设单参数和双参数项目响应模型,并通过在一次运行中估计能力,即所谓的并行校准,使用非等效锚试验设计对估计的能力进行垂直缩放。还讨论了测试设计与Fisher信息之间的联系。结果表明,当估计组之间平均能力的差异时,期望后验估计方法是优选的。结果还表明,具有中等难度的常见项目的测试设计具有更好的精度,这与以前水平等值的结果一致。
{"title":"Efficient Estimation of Mean Ability Growth Using Vertical Scaling","authors":"Jonas Bjermo, Frank Miller","doi":"10.1080/08957347.2021.1933981","DOIUrl":"https://doi.org/10.1080/08957347.2021.1933981","url":null,"abstract":"ABSTRACT In recent years, the interest in measuring growth in student ability in various subjects between different grades in school has increased. Therefore, good precision in the estimated growth is of importance. This paper aims to compare estimation methods and test designs when it comes to precision and bias of the estimated growth of mean ability between two groups of students that differ substantially. This is performed by a simulation study. One- and two-parameter item response models are assumed and the estimated abilities are vertically scaled using the non-equivalent anchor test design by estimating the abilities in one single run, so-called concurrent calibration. The connection between the test design and the Fisher information is also discussed. The results indicate that the expected a posteriori estimation method is preferred when estimating differences in mean ability between groups. Results also indicate that a test design with common items of medium difficulty leads to better precision, which coincides with previous results from horizontal equating.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"34 1","pages":"163 - 178"},"PeriodicalIF":1.5,"publicationDate":"2021-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2021.1933981","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43416092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Violation of Conditional Independence in the Many-Facets Rasch Model 多方面Rasch模型中条件独立性的违反
IF 1.5 4区 教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-04-03 DOI: 10.1080/08957347.2021.1890743
Christine E. DeMars
ABSTRACT Estimation of parameters for the many-facets Rasch model requires that conditional on the values of the facets, such as person ability, item difficulty, and rater severity, the observed responses within each facet are independent. This requirement has often been discussed for the Rasch models and 2PL and 3PL models, but it becomes more complex with more facets. To show how violation of conditional independence may be exhibited, three scenarios with different types of dependency are developed: (a) raters rating the same work, (b) a residual ability shared by two tasks, and (c) score on one task dependent on observed score on a previous task.
多面Rasch模型的参数估计要求以人的能力、项目难度和更严重程度等方面的值为条件,观察到的每个方面的反应是独立的。Rasch模型以及2PL和3PL模型经常讨论这个需求,但是它随着更多方面而变得更加复杂。为了展示违反条件独立是如何表现出来的,我们开发了三种不同类型依赖的场景:(a)评分者对相同的工作进行评分,(b)两个任务共享的剩余能力,以及(c)一个任务的得分取决于观察到的前一个任务的得分。
{"title":"Violation of Conditional Independence in the Many-Facets Rasch Model","authors":"Christine E. DeMars","doi":"10.1080/08957347.2021.1890743","DOIUrl":"https://doi.org/10.1080/08957347.2021.1890743","url":null,"abstract":"ABSTRACT Estimation of parameters for the many-facets Rasch model requires that conditional on the values of the facets, such as person ability, item difficulty, and rater severity, the observed responses within each facet are independent. This requirement has often been discussed for the Rasch models and 2PL and 3PL models, but it becomes more complex with more facets. To show how violation of conditional independence may be exhibited, three scenarios with different types of dependency are developed: (a) raters rating the same work, (b) a residual ability shared by two tasks, and (c) score on one task dependent on observed score on a previous task.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"34 1","pages":"122 - 138"},"PeriodicalIF":1.5,"publicationDate":"2021-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2021.1890743","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43821816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Coefficient β As Extension of KR-21 Reliability for Summed and Scaled Scores for Polytomously-scored Tests 系数β作为多元计分测验中总和分数和标度分数的KR-21信度扩展
IF 1.5 4区 教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-04-03 DOI: 10.1080/08957347.2021.1890740
Rashid S. Almehrizi
ABSTRACT KR-21 reliability and its extension (coefficient α) gives the reliability estimate of test scores under the assumption of tau-equivalent forms. KR-21 reliability gives the reliability estimate for summed scores for dichotomous items when items are randomly sampled from an infinite pool of similar items (randomly parallel forms). The article extends KR-21 to coefficient to estimate reliability of summed scores for test with any item scoring pattern under the same assumption of KR-21. Also, the article presents coefficient to estimate this reliability for nonlinearly scaled scores for any type of items. The article shows that coefficient is slightly different from the classical index of reliability. Results show that coefficient is not equal to the classical index of reliability for summed scores and scaled scores. Moreover, results using real data on psychological instruments reveal that different score scales yield different coefficient reliability estimates that are less than coefficient Gα by a function related to form-to-form differences in averages.
摘要KR-21可靠性及其扩展(系数α)给出了在tau等价形式假设下的考试成绩的可靠性估计。KR-21可靠性给出了当从无限多个相似项目(随机平行形式)中随机抽取项目时,二分项目的总分的可靠性估计。本文将KR-21扩展到系数,以估计在KR-21相同假设下,任何项目评分模式下的总分的可靠性。此外,文章还提出了系数来估计任何类型项目的非线性评分的可靠性。文章表明,该系数与经典的可靠性指标略有不同。结果表明,对于总分和量表得分,系数不等于经典的可靠性指标。此外,使用心理工具上的真实数据得出的结果表明,不同的评分量表产生了不同的系数可靠性估计值,该估计值小于系数Gα,其函数与平均值的形式差异有关。
{"title":"Coefficient β As Extension of KR-21 Reliability for Summed and Scaled Scores for Polytomously-scored Tests","authors":"Rashid S. Almehrizi","doi":"10.1080/08957347.2021.1890740","DOIUrl":"https://doi.org/10.1080/08957347.2021.1890740","url":null,"abstract":"ABSTRACT KR-21 reliability and its extension (coefficient α) gives the reliability estimate of test scores under the assumption of tau-equivalent forms. KR-21 reliability gives the reliability estimate for summed scores for dichotomous items when items are randomly sampled from an infinite pool of similar items (randomly parallel forms). The article extends KR-21 to coefficient to estimate reliability of summed scores for test with any item scoring pattern under the same assumption of KR-21. Also, the article presents coefficient to estimate this reliability for nonlinearly scaled scores for any type of items. The article shows that coefficient is slightly different from the classical index of reliability. Results show that coefficient is not equal to the classical index of reliability for summed scores and scaled scores. Moreover, results using real data on psychological instruments reveal that different score scales yield different coefficient reliability estimates that are less than coefficient Gα by a function related to form-to-form differences in averages.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"34 1","pages":"139 - 149"},"PeriodicalIF":1.5,"publicationDate":"2021-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44733498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Examining Three Learning Progressions in Middle-school Mathematics for Formative Assessment 中学数学三个学习过程的形成性评价
IF 1.5 4区 教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-03-08 DOI: 10.1080/08957347.2021.1890744
Duy N. Pham, C. Wells, Malcolm I. Bauer, E. Wylie, S. Monroe
ABSTRACT Assessments built on a theory of learning progressions are promising formative tools to support learning and teaching. The quality and usefulness of those assessments depend, in large part, on the validity of the theory-informed inferences about student learning made from the assessment results. In this study, we introduced an approach to address an important challenge related to examining theorized level links across progressions. We adopted the method to analyze response data for three learning progressions: Equality and Variable, Functions and Linear Functions, and Proportional Reasoning in middle-school mathematics . Multidimensional item response theory models were fit to the data to evaluate the postulated learning levels and level links. Our findings supported the theoretical hypotheses for Functions and Linear Functions, and Proportional Reasoning. Thus, items measuring these progressions can be used to develop formative assessment tasks, and assist instructional practices. Our findings did not support the theory underlying Equality and Variable. Implications for assessment developers and users, and future directions for research were discussed.
摘要建立在学习进步理论基础上的评估是支持学习和教学的有前途的形成性工具。这些评估的质量和有用性在很大程度上取决于从评估结果中得出的关于学生学习的理论推断的有效性。在这项研究中,我们引入了一种方法来解决一个重要的挑战,该挑战与检查进步之间的理论水平联系有关。我们采用该方法分析了中学数学中三个学习进度的反应数据:等式与变量、函数与线性函数以及比例推理。多维项目反应理论模型适用于数据,以评估假设的学习水平和水平联系。我们的研究结果支持了函数和线性函数以及比例推理的理论假设。因此,测量这些进步的项目可以用于制定形成性评估任务,并有助于教学实践。我们的研究结果并不支持平等和变量的理论基础。讨论了对评估开发人员和用户的影响,以及未来的研究方向。
{"title":"Examining Three Learning Progressions in Middle-school Mathematics for Formative Assessment","authors":"Duy N. Pham, C. Wells, Malcolm I. Bauer, E. Wylie, S. Monroe","doi":"10.1080/08957347.2021.1890744","DOIUrl":"https://doi.org/10.1080/08957347.2021.1890744","url":null,"abstract":"ABSTRACT Assessments built on a theory of learning progressions are promising formative tools to support learning and teaching. The quality and usefulness of those assessments depend, in large part, on the validity of the theory-informed inferences about student learning made from the assessment results. In this study, we introduced an approach to address an important challenge related to examining theorized level links across progressions. We adopted the method to analyze response data for three learning progressions: Equality and Variable, Functions and Linear Functions, and Proportional Reasoning in middle-school mathematics . Multidimensional item response theory models were fit to the data to evaluate the postulated learning levels and level links. Our findings supported the theoretical hypotheses for Functions and Linear Functions, and Proportional Reasoning. Thus, items measuring these progressions can be used to develop formative assessment tasks, and assist instructional practices. Our findings did not support the theory underlying Equality and Variable. Implications for assessment developers and users, and future directions for research were discussed.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"34 1","pages":"107 - 121"},"PeriodicalIF":1.5,"publicationDate":"2021-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2021.1890744","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48413185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Test-Taking Effort in Low-Stakes Group-Based Educational Testing: A Meta-Analysis of Interventions 提高低风险群体教育测试的应试努力:干预措施的荟萃分析
IF 1.5 4区 教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-03-01 DOI: 10.1080/08957347.2021.1890741
Joseph A. Rios
ABSTRACT Four decades of research have shown that students’ low test-taking effort is a serious threat to the validity of score-based inferences from low-stakes, group-based educational assessments. This meta-analysis sought to identify effective interventions for improving students’ test-taking effort in such contexts. Included studies (a) used a treatment-control group design; (b) administered a low-stakes group-based educational assessment; (c) employed an intervention to improve test-taking motivation; and (d) evaluated test-taking effort and/or test performance as outcomes. The analysis included 53 studies (N = 59,096) that produced 60 and 105 effect sizes of test-taking effort and test performance, respectively. On average, interventions were found to improve test-taking effort and test performance by 0.13 standard deviations (SD) each. The largest gains in test-taking effort were observed when providing external incentives followed by increasing test relevance, while no significant differences were found between these two intervention types in improving test performance. Furthermore, negligible impact was detected on both dependent variables for interventions that modified assessment design or promised feedback. Recommendations for future research and practice are discussed.
四十年的研究表明,学生的低应试努力严重威胁着低风险、基于群体的教育评估中基于分数的推断的有效性。本荟萃分析旨在确定有效的干预措施,以提高学生在这种情况下的应试努力。纳入的研究(a)采用治疗-对照组设计;(b)进行以小组为基础的低风险教育评估;(c)采用干预措施提高应试动机;(d)评估应试努力和/或考试成绩作为结果。分析包括53项研究(N = 59,096),分别产生了60和105个测试努力和测试表现的效应大小。平均而言,研究发现干预措施对考试努力和考试成绩的改善分别为0.13个标准差(SD)。当提供外部激励,然后增加考试相关性时,观察到考试努力的最大收益,而这两种干预类型在提高考试成绩方面没有发现显着差异。此外,对修改评估设计或承诺反馈的干预措施的两个因变量的影响可以忽略不计。最后对今后的研究和实践提出了建议。
{"title":"Improving Test-Taking Effort in Low-Stakes Group-Based Educational Testing: A Meta-Analysis of Interventions","authors":"Joseph A. Rios","doi":"10.1080/08957347.2021.1890741","DOIUrl":"https://doi.org/10.1080/08957347.2021.1890741","url":null,"abstract":"ABSTRACT Four decades of research have shown that students’ low test-taking effort is a serious threat to the validity of score-based inferences from low-stakes, group-based educational assessments. This meta-analysis sought to identify effective interventions for improving students’ test-taking effort in such contexts. Included studies (a) used a treatment-control group design; (b) administered a low-stakes group-based educational assessment; (c) employed an intervention to improve test-taking motivation; and (d) evaluated test-taking effort and/or test performance as outcomes. The analysis included 53 studies (N = 59,096) that produced 60 and 105 effect sizes of test-taking effort and test performance, respectively. On average, interventions were found to improve test-taking effort and test performance by 0.13 standard deviations (SD) each. The largest gains in test-taking effort were observed when providing external incentives followed by increasing test relevance, while no significant differences were found between these two intervention types in improving test performance. Furthermore, negligible impact was detected on both dependent variables for interventions that modified assessment design or promised feedback. Recommendations for future research and practice are discussed.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"34 1","pages":"85 - 106"},"PeriodicalIF":1.5,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2021.1890741","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59806077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
A Method for Identifying Partial Test-Taking Engagement 一种识别部分测试参与的方法
IF 1.5 4区 教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-03-01 DOI: 10.1080/08957347.2021.1890745
S. Wise, Megan Kuhfeld
ABSTRACT Effort-moderated (E-M) scoring is intended to estimate how well a disengaged test taker would have performed had they been fully engaged. It accomplishes this adjustment by excluding disengaged responses from scoring and estimating performance from the remaining responses. The scoring method, however, assumes that the remaining responses are not affected by disengagement. A recent study provided evidence that the responses used in E-M scoring can sometimes reflect less-than-full, or partial engagement. In these instances, E-M scores will be less trustworthy because they can still be distorted by disengagement. This paper describes a method for identifying partial engagement and provides validation evidence to support its use and interpretation. When test events indicate the presence of partial engagement, effort-moderated scores should be interpreted cautiously.
摘要努力调节(E-M)评分旨在评估一个没有参与的考生在完全参与的情况下会有多好的表现。它通过将脱离的响应从评分中排除,并从剩余响应中估计性能来实现这一调整。然而,评分方法假设剩余的反应不受脱离接触的影响。最近的一项研究提供了证据,证明E-M评分中使用的回答有时会反映出不完全或部分参与。在这些情况下,E-M分数将不太可信,因为它们仍然可能因脱离而失真。本文描述了一种识别部分参与的方法,并提供了验证证据来支持其使用和解释。当测试事件表明存在部分参与时,应谨慎解读努力调节的分数。
{"title":"A Method for Identifying Partial Test-Taking Engagement","authors":"S. Wise, Megan Kuhfeld","doi":"10.1080/08957347.2021.1890745","DOIUrl":"https://doi.org/10.1080/08957347.2021.1890745","url":null,"abstract":"ABSTRACT Effort-moderated (E-M) scoring is intended to estimate how well a disengaged test taker would have performed had they been fully engaged. It accomplishes this adjustment by excluding disengaged responses from scoring and estimating performance from the remaining responses. The scoring method, however, assumes that the remaining responses are not affected by disengagement. A recent study provided evidence that the responses used in E-M scoring can sometimes reflect less-than-full, or partial engagement. In these instances, E-M scores will be less trustworthy because they can still be distorted by disengagement. This paper describes a method for identifying partial engagement and provides validation evidence to support its use and interpretation. When test events indicate the presence of partial engagement, effort-moderated scores should be interpreted cautiously.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"34 1","pages":"150 - 161"},"PeriodicalIF":1.5,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2021.1890745","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44049497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
期刊
Applied Measurement in Education
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1