首页 > 最新文献

Journal of Educational and Behavioral Statistics最新文献

英文 中文
A Case Study of Nonresponse Bias Analysis in Educational Assessment Surveys 教育评估调查中的无应答偏差分析案例研究
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-04-09 DOI: 10.3102/10769986221141074
Yajuan Si, R. Little, Ya Mo, N. Sedransk
Nonresponse bias is a widely prevalent problem for data on education. We develop a ten-step exemplar to guide nonresponse bias analysis (NRBA) in cross-sectional studies and apply these steps to the Early Childhood Longitudinal Study, Kindergarten Class of 2010–2011. A key step is the construction of indices of nonresponse bias based on proxy pattern-mixture models for survey variables of interest. A novel feature is to characterize the strength of evidence about nonresponse bias contained in these indices, based on the strength of the relationship between the characteristics in the nonresponse adjustment and the key survey variables. Our NRBA improves the existing methods by incorporating both missing at random and missing not at random mechanisms, and all analyses can be done straightforwardly with standard statistical software.
对于教育数据而言,不回应偏见是一个普遍存在的问题。我们开发了一个十步样本来指导横断面研究中的无反应偏倚分析(NRBA),并将这些步骤应用于2010-2011幼儿园的幼儿纵向研究。一个关键步骤是基于感兴趣的调查变量的代理模式混合模型构建无反应偏差指数。一个新的特征是,根据无反应调整的特征和关键调查变量之间的关系强度,来表征这些指数中包含的无反应偏差的证据强度。我们的NRBA通过结合随机缺失和非随机缺失机制改进了现有方法,所有分析都可以直接使用标准统计软件进行。
{"title":"A Case Study of Nonresponse Bias Analysis in Educational Assessment Surveys","authors":"Yajuan Si, R. Little, Ya Mo, N. Sedransk","doi":"10.3102/10769986221141074","DOIUrl":"https://doi.org/10.3102/10769986221141074","url":null,"abstract":"Nonresponse bias is a widely prevalent problem for data on education. We develop a ten-step exemplar to guide nonresponse bias analysis (NRBA) in cross-sectional studies and apply these steps to the Early Childhood Longitudinal Study, Kindergarten Class of 2010–2011. A key step is the construction of indices of nonresponse bias based on proxy pattern-mixture models for survey variables of interest. A novel feature is to characterize the strength of evidence about nonresponse bias contained in these indices, based on the strength of the relationship between the characteristics in the nonresponse adjustment and the key survey variables. Our NRBA improves the existing methods by incorporating both missing at random and missing not at random mechanisms, and all analyses can be done straightforwardly with standard statistical software.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"48 1","pages":"271 - 295"},"PeriodicalIF":2.4,"publicationDate":"2021-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46105939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Introduction to JEBS Special Issue on NAEP Linked Aggregate Scores 关于NAEP关联总分的JEBS特刊简介
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-04-01 DOI: 10.3102/10769986211001480
D. McCaffrey, S. Culpepper
The Stanford Education Data Archive (SEDA) was created by Sean Reardon, Andrew Ho, Demetra Kalogrides, and their colleagues using annual state summative test score data retrieved from the EDFacts Restricted-Use Files and publicly available NAEP data from the National Center for Education Statistics. SEDA provides test score data on a common scale across all states for mathematics and reading language arts for students in Grades 3 through 8 for almost all schools, districts, and counties in the United States. An online tool (edopportu nity.org) allows users to visually compare schools and districts from anywhere in the country. Data also include various covariates at each of these levels, and all the data can be downloaded for free for analysis. These data have the potential to be a very valuable resource for researchers, educators, policy makers, and possibly even the general public. The catch is that there is no common standardized test administered to students in Grades 3 through 8 in all schools and school districts in all states. NAEP is only administered in a relatively small sample of schools in each state and only to students in Grades 4 and 8 and only every other year. The school data in SEDA are derived from the annual tests administered by each state in accordance with federal regulations. Reardon, Ho, Kalogrides, and colleagues start with aggregate data of the numbers of students in each school or district meeting various performance levels on their state standardized tests. State tests are on different scales and test somewhat different content. They also use different cutoffs for performance levels that are not common across states. Reardon, Ho, Kalogrides, and colleagues convert these frequencies to means and standard deviations for the scores in each school or district using the Heteroskedastic Ordered Probit model that was developed into a series of papers in JEBS (Lockwood et al., 2018; Reardon et al., 2017; Shear & Reardon, 2021). They then link these means and standard deviations to the NAEP scale using methods described in Reardon et al. (2021). Reardon, Ho, Kalogrides, and colleagues stitched together a collection of methods to create a national data source of Journal of Educational and Behavioral Statistics 2021, Vol. 46, No. 2, pp. 135–137 DOI: 10.3102/10769986211001480 Article reuse guidelines: sagepub.com/journals-permissions © 2021 AERA. https://journals.sagepub.com/home/jeb
斯坦福教育数据档案(SEDA)是由Sean Reardon、Andrew Ho、Demetra Kalogrides和他们的同事创建的,他们使用从EDFacts限制使用文件中检索的年度州总结性考试成绩数据和国家教育统计中心公开的NAEP数据。SEDA为美国几乎所有学校、地区和县的3至8年级学生提供通用规模的数学和阅读语言艺术考试成绩数据。一个在线工具(edopportunities nity.org)可以让用户直观地比较全国各地的学校和学区。数据还包括这些级别上的各种协变量,所有数据都可以免费下载以供分析。这些数据有可能成为研究人员、教育工作者、政策制定者,甚至可能是公众非常宝贵的资源。问题是,在所有州的所有学校和学区,没有针对三年级到八年级学生的统一标准化考试。NAEP只在每个州相对较小的学校样本中实施,只针对四年级和八年级的学生,而且每隔一年才实施一次。SEDA中的学校数据来自各州根据联邦法规进行的年度测试。Reardon、Ho、Kalogrides和同事们从每个学校或学区在州标准化考试中达到不同表现水平的学生人数的汇总数据开始。各州考试的规模不同,测试的内容也有所不同。他们还使用不同的绩效水平截止值,这在各州并不常见。Reardon、Ho、Kalogrides和同事使用异方差有序概率模型(Heteroskedastic Ordered Probit model)将这些频率转换为每个学校或学区分数的均值和标准差,该模型已在JEBS上发表了一系列论文(Lockwood等人,2018;Reardon et al., 2017;Shear & Reardon, 2021)。然后,他们使用Reardon等人(2021)中描述的方法将这些均值和标准差与NAEP量表联系起来。Reardon, Ho, Kalogrides和同事们将一系列方法结合在一起,创建了《教育与行为统计杂志》2021年第46卷第2期135-137页的国家数据源DOI: 10.3102/10769986211001480文章重用指南:sagepub.com/journals-permissions©2021 AERA。https://journals.sagepub.com/home/jeb
{"title":"Introduction to JEBS Special Issue on NAEP Linked Aggregate Scores","authors":"D. McCaffrey, S. Culpepper","doi":"10.3102/10769986211001480","DOIUrl":"https://doi.org/10.3102/10769986211001480","url":null,"abstract":"The Stanford Education Data Archive (SEDA) was created by Sean Reardon, Andrew Ho, Demetra Kalogrides, and their colleagues using annual state summative test score data retrieved from the EDFacts Restricted-Use Files and publicly available NAEP data from the National Center for Education Statistics. SEDA provides test score data on a common scale across all states for mathematics and reading language arts for students in Grades 3 through 8 for almost all schools, districts, and counties in the United States. An online tool (edopportu nity.org) allows users to visually compare schools and districts from anywhere in the country. Data also include various covariates at each of these levels, and all the data can be downloaded for free for analysis. These data have the potential to be a very valuable resource for researchers, educators, policy makers, and possibly even the general public. The catch is that there is no common standardized test administered to students in Grades 3 through 8 in all schools and school districts in all states. NAEP is only administered in a relatively small sample of schools in each state and only to students in Grades 4 and 8 and only every other year. The school data in SEDA are derived from the annual tests administered by each state in accordance with federal regulations. Reardon, Ho, Kalogrides, and colleagues start with aggregate data of the numbers of students in each school or district meeting various performance levels on their state standardized tests. State tests are on different scales and test somewhat different content. They also use different cutoffs for performance levels that are not common across states. Reardon, Ho, Kalogrides, and colleagues convert these frequencies to means and standard deviations for the scores in each school or district using the Heteroskedastic Ordered Probit model that was developed into a series of papers in JEBS (Lockwood et al., 2018; Reardon et al., 2017; Shear & Reardon, 2021). They then link these means and standard deviations to the NAEP scale using methods described in Reardon et al. (2021). Reardon, Ho, Kalogrides, and colleagues stitched together a collection of methods to create a national data source of Journal of Educational and Behavioral Statistics 2021, Vol. 46, No. 2, pp. 135–137 DOI: 10.3102/10769986211001480 Article reuse guidelines: sagepub.com/journals-permissions © 2021 AERA. https://journals.sagepub.com/home/jeb","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"46 1","pages":"135 - 137"},"PeriodicalIF":2.4,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46561773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Validation Methods for Aggregate-Level Test Scale Linking: A Rejoinder 聚合级测试量表链接的验证方法:一个反驳
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-04-01 DOI: 10.3102/1076998621994540
Andrew D. Ho, Sean F. Reardon, Demetra Kalogrides
In this issue, Reardon, Kalogrides, and Ho developed precision-adjusted random effects models to estimate aggregate-level linking error, for populations and subpopulations, for averages and progress over time. We are grateful to past editor Dan McCaffrey for selecting our paper as the focal article for a set of commentaries from our colleagues Daniel Bolt, Mark Davison, Alina von Davier, Tim Moses, and Neil Dorans. These commentaries reinforce important cautions and identify promising directions for future research. In this rejoinder, we clarify aspects of our originally proposed method. (1) Validation methods provide evidence of benefits and risks that different experts may weigh differently for different purposes. (2) Our proposed method differs from “standard mapping” procedures using the National Assessment of Educational Progress not only by using a linear (vs. equipercentile) link but also by targeting direct validity evidence about counterfactual aggregate scores. (3) Multilevel approaches that assume common score scales across states are indeed a promising next step for validation, and we hope that states enable researchers to use more of their common-core-era consortium test data for this purpose. Finally, we apply our linking method to an extended panel of data from 2009 to 2017 to show that linking recovery has remained stable.
在本期中,Reardon、Kalogrides和Ho开发了精度调整的随机效应模型,以估计种群和亚种群的总体水平连接误差,以及随时间的平均值和进展。我们感谢前任编辑Dan McCaffrey选择我们的论文作为我们同事Daniel Bolt、Mark Davison、Alina von Davier、Tim Moses和Neil Dorans的一系列评论的焦点文章。这些评论加强了重要的注意事项,并为未来的研究指明了有希望的方向。在这篇反驳中,我们澄清了我们最初提出的方法的各个方面。(1) 验证方法提供了利益和风险的证据,不同的专家可能会出于不同的目的对其进行不同的权衡。(2) 我们提出的方法与使用国家教育进步评估的“标准映射”程序的不同之处不仅在于使用线性(与等百分比)链接,还在于针对反事实总分的直接有效性证据。(3) 假设各州的评分标准相同的多层次方法确实是下一步验证的好方法,我们希望各州能够让研究人员为此目的使用更多的共同核心时代联盟测试数据。最后,我们将我们的链接方法应用于2009年至2017年的一组扩展数据,以表明链接恢复保持稳定。
{"title":"Validation Methods for Aggregate-Level Test Scale Linking: A Rejoinder","authors":"Andrew D. Ho, Sean F. Reardon, Demetra Kalogrides","doi":"10.3102/1076998621994540","DOIUrl":"https://doi.org/10.3102/1076998621994540","url":null,"abstract":"In this issue, Reardon, Kalogrides, and Ho developed precision-adjusted random effects models to estimate aggregate-level linking error, for populations and subpopulations, for averages and progress over time. We are grateful to past editor Dan McCaffrey for selecting our paper as the focal article for a set of commentaries from our colleagues Daniel Bolt, Mark Davison, Alina von Davier, Tim Moses, and Neil Dorans. These commentaries reinforce important cautions and identify promising directions for future research. In this rejoinder, we clarify aspects of our originally proposed method. (1) Validation methods provide evidence of benefits and risks that different experts may weigh differently for different purposes. (2) Our proposed method differs from “standard mapping” procedures using the National Assessment of Educational Progress not only by using a linear (vs. equipercentile) link but also by targeting direct validity evidence about counterfactual aggregate scores. (3) Multilevel approaches that assume common score scales across states are indeed a promising next step for validation, and we hope that states enable researchers to use more of their common-core-era consortium test data for this purpose. Finally, we apply our linking method to an extended panel of data from 2009 to 2017 to show that linking recovery has remained stable.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"46 1","pages":"209 - 218"},"PeriodicalIF":2.4,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49049001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Rating Scale Mixture Model to Account for the Tendency to Middle and Extreme Categories 考虑中等和极端类别倾向的评级量表混合模型
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-03-31 DOI: 10.3102/1076998621992554
R. Colombi, S. Giordano, G. Tutz
A mixture of logit models is proposed that discriminates between responses to rating questions that are affected by a tendency to prefer middle or extremes of the scale regardless of the content of the item (response styles) and purely content-driven preferences. Explanatory variables are used to characterize the content-driven way of answering as well as the tendency to middle or extreme categories. The proposed model is extended to account for the presence of response styles in the case of several items, and the association among responses is described, both when they are content driven or dictated by response styles. In addition, stochastic orderings, related to the tendency to select middle or extreme categories, are introduced and investigated. A simulation study describes the effectiveness of the proposed model, and an application to a questionnaire on attitudes toward ethnic minorities illustrates the applicability of the modeling approach.
我们提出了一种混合的logit模型,用于区分对评分问题的回答,这些问题受不考虑项目内容(回答风格)而倾向于选择量表的中间或极端的倾向的影响,以及纯粹的内容驱动的偏好。解释变量用于描述内容驱动的回答方式,以及趋向于中间或极端类别。所建议的模型被扩展为在多个项目的情况下考虑响应风格的存在,并且描述了响应之间的关联,无论是内容驱动的还是由响应风格决定的。此外,引入并研究了与选择中间或极端类别的倾向有关的随机排序。仿真研究表明了该模型的有效性,并通过对少数民族态度问卷调查的应用说明了该模型方法的适用性。
{"title":"A Rating Scale Mixture Model to Account for the Tendency to Middle and Extreme Categories","authors":"R. Colombi, S. Giordano, G. Tutz","doi":"10.3102/1076998621992554","DOIUrl":"https://doi.org/10.3102/1076998621992554","url":null,"abstract":"A mixture of logit models is proposed that discriminates between responses to rating questions that are affected by a tendency to prefer middle or extremes of the scale regardless of the content of the item (response styles) and purely content-driven preferences. Explanatory variables are used to characterize the content-driven way of answering as well as the tendency to middle or extreme categories. The proposed model is extended to account for the presence of response styles in the case of several items, and the association among responses is described, both when they are content driven or dictated by response styles. In addition, stochastic orderings, related to the tendency to select middle or extreme categories, are introduced and investigated. A simulation study describes the effectiveness of the proposed model, and an application to a questionnaire on attitudes toward ethnic minorities illustrates the applicability of the modeling approach.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"46 1","pages":"682 - 716"},"PeriodicalIF":2.4,"publicationDate":"2021-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45838221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Detecting Noneffortful Responses Based on a Residual Method Using an Iterative Purification Process 基于残差法的迭代纯化过程非努力响应检测
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-03-29 DOI: 10.3102/1076998621994366
Yue Liu, Hongyun Liu
The prevalence and serious consequences of noneffortful responses from unmotivated examinees are well-known in educational measurement. In this study, we propose to apply an iterative purification process based on a response time residual method with fixed item parameter estimates to detect noneffortful responses. The proposed method is compared with the traditional residual method and noniterative method with fixed item parameters in two simulation studies in terms of noneffort detection accuracy and parameter recovery. The results show that when severity of noneffort is high, the proposed method leads to a much higher true positive rate with a small increase of false discovery rate. In addition, parameter estimation is significantly improved by the strategies of fixing item parameters and iteratively cleansing. These results suggest that the proposed method is a potential solution to reduce the impact of data contamination due to severe low test-taking effort and to obtain more accurate parameter estimates. An empirical study is also conducted to show the differences in the detection rate and parameter estimates among different approaches.
在教育测量中,无动机考生的不努力反应的普遍性和严重后果是众所周知的。在这项研究中,我们提出了一种基于响应时间残差法的迭代净化过程,该方法具有固定的项目参数估计,以检测不费力的响应。通过两次仿真研究,将该方法与传统残差法和固定项目参数的非迭代法在检测精度和参数恢复方面进行了比较。结果表明,当不费力的严重程度较高时,该方法的真阳性率要高得多,而错误发现率的增加幅度较小。此外,采用固定项目参数和迭代清理策略,显著改善了参数估计。这些结果表明,所提出的方法是一种潜在的解决方案,可以减少由于严重的低测试工作量而造成的数据污染的影响,并获得更准确的参数估计。实证研究也显示了不同方法在检出率和参数估计上的差异。
{"title":"Detecting Noneffortful Responses Based on a Residual Method Using an Iterative Purification Process","authors":"Yue Liu, Hongyun Liu","doi":"10.3102/1076998621994366","DOIUrl":"https://doi.org/10.3102/1076998621994366","url":null,"abstract":"The prevalence and serious consequences of noneffortful responses from unmotivated examinees are well-known in educational measurement. In this study, we propose to apply an iterative purification process based on a response time residual method with fixed item parameter estimates to detect noneffortful responses. The proposed method is compared with the traditional residual method and noniterative method with fixed item parameters in two simulation studies in terms of noneffort detection accuracy and parameter recovery. The results show that when severity of noneffort is high, the proposed method leads to a much higher true positive rate with a small increase of false discovery rate. In addition, parameter estimation is significantly improved by the strategies of fixing item parameters and iteratively cleansing. These results suggest that the proposed method is a potential solution to reduce the impact of data contamination due to severe low test-taking effort and to obtain more accurate parameter estimates. An empirical study is also conducted to show the differences in the detection rate and parameter estimates among different approaches.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"46 1","pages":"717 - 752"},"PeriodicalIF":2.4,"publicationDate":"2021-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44643360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Item Characteristic Curve Asymmetry: A Better Way to Accommodate Slips and Guesses Than a Four-Parameter Model? 项目特征曲线不对称:比四参数模型更好地适应滑移和猜测?
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-03-29 DOI: 10.3102/10769986211003283
Xiangyi Liao, D. Bolt
Four-parameter models have received increasing psychometric attention in recent years, as a reduced upper asymptote for item characteristic curves can be appealing for measurement applications such as adaptive testing and person-fit assessment. However, applications can be challenging due to the large number of parameters in the model. In this article, we demonstrate in the context of mathematics assessments how the slip and guess parameters of a four-parameter model may often be empirically related. This observation also has a psychological explanation to the extent that both asymptote parameters may be manifestations of a single item complexity characteristic. The relationship between lower and upper asymptotes motivates the consideration of an asymmetric item response theory model as a three-parameter alternative to the four-parameter model. Using actual response data from mathematics multiple-choice tests, we demonstrate the empirical superiority of a three-parameter asymmetric model in several standardized tests of mathematics. To the extent that a model of asymmetry ultimately portrays slips and guesses not as purely random but rather as proficiency-related phenomena, we argue that the asymmetric approach may also have greater psychological plausibility.
近年来,四参数模型越来越受到心理测量学的关注,因为项目特征曲线的上渐近线减少可以吸引测量应用,如自适应测试和人适合评估。然而,由于模型中有大量参数,应用程序可能具有挑战性。在本文中,我们在数学评估的背景下证明了四参数模型的滑移和猜测参数如何经常与经验相关。这一观察结果也有心理学上的解释,在某种程度上,两个渐近线参数可能是单个项目复杂性特征的表现。上下渐近线之间的关系激发了不对称项目反应理论模型作为四参数模型的三参数替代方案的考虑。利用数学多项选择题的实际回答数据,我们证明了三参数不对称模型在若干数学标准化测试中的经验优势。在某种程度上,不对称模型最终将失误和猜测描述为不是纯粹随机的,而是与熟练程度相关的现象,我们认为不对称方法也可能具有更大的心理学合理性。
{"title":"Item Characteristic Curve Asymmetry: A Better Way to Accommodate Slips and Guesses Than a Four-Parameter Model?","authors":"Xiangyi Liao, D. Bolt","doi":"10.3102/10769986211003283","DOIUrl":"https://doi.org/10.3102/10769986211003283","url":null,"abstract":"Four-parameter models have received increasing psychometric attention in recent years, as a reduced upper asymptote for item characteristic curves can be appealing for measurement applications such as adaptive testing and person-fit assessment. However, applications can be challenging due to the large number of parameters in the model. In this article, we demonstrate in the context of mathematics assessments how the slip and guess parameters of a four-parameter model may often be empirically related. This observation also has a psychological explanation to the extent that both asymptote parameters may be manifestations of a single item complexity characteristic. The relationship between lower and upper asymptotes motivates the consideration of an asymmetric item response theory model as a three-parameter alternative to the four-parameter model. Using actual response data from mathematics multiple-choice tests, we demonstrate the empirical superiority of a three-parameter asymmetric model in several standardized tests of mathematics. To the extent that a model of asymmetry ultimately portrays slips and guesses not as purely random but rather as proficiency-related phenomena, we argue that the asymmetric approach may also have greater psychological plausibility.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"46 1","pages":"753 - 775"},"PeriodicalIF":2.4,"publicationDate":"2021-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49036230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Monitoring Item Performance With CUSUM Statistics in Continuous Testing 在连续测试中使用CUSUM统计监测项目性能
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-03-08 DOI: 10.3102/1076998621994563
Yi-Hsuan Lee, C. Lewis
In many educational assessments, items are reused in different administrations throughout the life of the assessments. Ideally, a reused item should perform relatively similarly over time. In reality, an item may become easier with exposure, especially when item preknowledge has occurred. This article presents a novel cumulative sum procedure for detecting item preknowledge in continuous testing where data for each reused item may be obtained from small and varying sample sizes across administrations. Its performance is evaluated with simulations and analytical work. The approach is effective in detecting item preknowledge quickly with group size at least 10 and is easy to implement with varying item parameters. In addition, it is robust to the ability estimation error introduced in the simulations.
在许多教育评估中,项目在评估的整个生命周期内被不同的管理部门重复使用。理想情况下,随着时间的推移,重复使用的项目应该表现得相对相似。事实上,一个项目可能会随着曝光而变得更容易,尤其是当项目发生预先知情时。本文提出了一种新的累积和程序,用于在连续测试中检测项目先验知识,其中每个重复使用项目的数据可以从不同管理部门的小样本量和不同样本量中获得。通过模拟和分析工作对其性能进行了评估。该方法在组大小至少为10的情况下快速检测项目先验知识是有效的,并且在不同项目参数的情况下易于实现。此外,它对仿真中引入的能力估计误差具有鲁棒性。
{"title":"Monitoring Item Performance With CUSUM Statistics in Continuous Testing","authors":"Yi-Hsuan Lee, C. Lewis","doi":"10.3102/1076998621994563","DOIUrl":"https://doi.org/10.3102/1076998621994563","url":null,"abstract":"In many educational assessments, items are reused in different administrations throughout the life of the assessments. Ideally, a reused item should perform relatively similarly over time. In reality, an item may become easier with exposure, especially when item preknowledge has occurred. This article presents a novel cumulative sum procedure for detecting item preknowledge in continuous testing where data for each reused item may be obtained from small and varying sample sizes across administrations. Its performance is evaluated with simulations and analytical work. The approach is effective in detecting item preknowledge quickly with group size at least 10 and is easy to implement with varying item parameters. In addition, it is robust to the ability estimation error introduced in the simulations.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"46 1","pages":"611 - 648"},"PeriodicalIF":2.4,"publicationDate":"2021-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48504334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Jenss–Bayley Latent Change Score Model With Individual Ratio of the Growth Acceleration in the Framework of Individual Measurement Occasions 个体测量情景框架下具有个体增长加速率的Jens–Bayley潜在变化得分模型
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-02-27 DOI: 10.3102/10769986221099919
Jin Liu
Longitudinal data analysis has been widely employed to examine between-individual differences in within-individual changes. One challenge of such analyses is that the rate-of-change is only available indirectly when change patterns are nonlinear with respect to time. Latent change score models (LCSMs), which can be employed to investigate the change in rate-of-change at the individual level, have been developed to address this challenge. We extend an existing LCSM with the Jenss–Bayley growth curve and propose a novel expression for change scores that allows for (1) unequally spaced study waves and (2) individual measurement occasions around each wave. We also extend the existing model to estimate the individual ratio of the growth acceleration (that largely determines the trajectory shape and is viewed as the most important parameter in the Jenss–Bayley model). We present the proposed model by a simulation study and a real-world data analysis. Our simulation study demonstrates that the proposed model can estimate the parameters unbiasedly and precisely and exhibit target confidence interval coverage. The simulation study also shows that the proposed model with the novel expression for the change scores outperforms the existing model. An empirical example using longitudinal reading scores shows that the model can estimate the individual ratio of the growth acceleration and generate individual rate-of-change in practice. We also provide the corresponding code for the proposed model.
纵向数据分析已被广泛用于检验个体内部变化中的个体间差异。这种分析的一个挑战是,只有当变化模式相对于时间是非线性的时候,变化率才能间接得到。为了应对这一挑战,人们开发了潜在变化评分模型(lcsm),该模型可用于研究个人水平上的变化率变化。我们用Jenss-Bayley生长曲线扩展了现有的LCSM,并提出了一种新的变化分数表达式,该表达式允许(1)不均匀间隔的学习波和(2)每个波周围的单独测量场合。我们还扩展了现有模型来估计生长加速的个体比率(这在很大程度上决定了轨迹形状,被视为Jenss-Bayley模型中最重要的参数)。我们通过仿真研究和实际数据分析提出了该模型。仿真研究表明,该模型能够准确、无偏地估计参数,并具有目标置信区间覆盖率。仿真研究还表明,采用新的变化分数表达式的模型优于现有的模型。通过纵向阅读分数的实证分析表明,该模型在实际应用中能够较好地估计个体的增长加速比,并生成个体的变化速率。我们还为提议的模型提供了相应的代码。
{"title":"Jenss–Bayley Latent Change Score Model With Individual Ratio of the Growth Acceleration in the Framework of Individual Measurement Occasions","authors":"Jin Liu","doi":"10.3102/10769986221099919","DOIUrl":"https://doi.org/10.3102/10769986221099919","url":null,"abstract":"Longitudinal data analysis has been widely employed to examine between-individual differences in within-individual changes. One challenge of such analyses is that the rate-of-change is only available indirectly when change patterns are nonlinear with respect to time. Latent change score models (LCSMs), which can be employed to investigate the change in rate-of-change at the individual level, have been developed to address this challenge. We extend an existing LCSM with the Jenss–Bayley growth curve and propose a novel expression for change scores that allows for (1) unequally spaced study waves and (2) individual measurement occasions around each wave. We also extend the existing model to estimate the individual ratio of the growth acceleration (that largely determines the trajectory shape and is viewed as the most important parameter in the Jenss–Bayley model). We present the proposed model by a simulation study and a real-world data analysis. Our simulation study demonstrates that the proposed model can estimate the parameters unbiasedly and precisely and exhibit target confidence interval coverage. The simulation study also shows that the proposed model with the novel expression for the change scores outperforms the existing model. An empirical example using longitudinal reading scores shows that the model can estimate the individual ratio of the growth acceleration and generate individual rate-of-change in practice. We also provide the corresponding code for the proposed model.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"47 1","pages":"507 - 543"},"PeriodicalIF":2.4,"publicationDate":"2021-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47673583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Estimating Difference-Score Reliability in Pretest–Posttest Settings 评估测试前-测试后设置中的差异得分可靠性
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-02-15 DOI: 10.3102/1076998620986948
Zhengguo Gu, W. Emons, K. Sijtsma
Clinical, medical, and health psychologists use difference scores obtained from pretest–posttest designs employing the same test to assess intraindividual change possibly caused by an intervention addressing, for example, anxiety, depression, eating disorder, or addiction. Reliability of difference scores is important for interpreting observed change. This article compares the well-documented traditional method and the unfamiliar, rarely used item-level method for estimating difference-score reliability. We simulated data under various conditions that are typical of change assessment in pretest–posttest designs. The item-level method had smaller bias and greater precision than the traditional method and may be recommended for practical use.
临床、医学和健康心理学家使用从采用相同测试的前测-后测设计中获得的差异分数来评估可能由针对焦虑、抑郁、饮食障碍或成瘾的干预措施引起的个体内变化。差异得分的可靠性对于解释观察到的变化很重要。本文比较了文献丰富的传统方法和不熟悉、很少使用的项目级方法来估计差异得分的可靠性。我们模拟了各种条件下的数据,这些条件是前测-后测设计中变化评估的典型情况。与传统方法相比,项目级方法具有更小的偏差和更高的精度,可以推荐用于实际应用。
{"title":"Estimating Difference-Score Reliability in Pretest–Posttest Settings","authors":"Zhengguo Gu, W. Emons, K. Sijtsma","doi":"10.3102/1076998620986948","DOIUrl":"https://doi.org/10.3102/1076998620986948","url":null,"abstract":"Clinical, medical, and health psychologists use difference scores obtained from pretest–posttest designs employing the same test to assess intraindividual change possibly caused by an intervention addressing, for example, anxiety, depression, eating disorder, or addiction. Reliability of difference scores is important for interpreting observed change. This article compares the well-documented traditional method and the unfamiliar, rarely used item-level method for estimating difference-score reliability. We simulated data under various conditions that are typical of change assessment in pretest–posttest designs. The item-level method had smaller bias and greater precision than the traditional method and may be recommended for practical use.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"46 1","pages":"592 - 610"},"PeriodicalIF":2.4,"publicationDate":"2021-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46183979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Statistical Power for Estimating Treatment Effects Using Difference-in-Differences and Comparative Interrupted Time Series Estimators With Variation in Treatment Timing 利用差中之差和具有治疗时间变化的比较中断时间序列估计器估计治疗效果的统计能力
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-02-12 DOI: 10.3102/10769986211070625
Peter Z. Schochet
This article develops new closed-form variance expressions for power analyses for commonly used difference-in-differences (DID) and comparative interrupted time series (CITS) panel data estimators. The main contribution is to incorporate variation in treatment timing into the analysis. The power formulas also account for other key design features that arise in practice: autocorrelated errors, unequal measurement intervals, and clustering due to the unit of treatment assignment. We consider power formulas for both cross-sectional and longitudinal models and allow for covariates. An illustrative power analysis provides guidance on appropriate sample sizes. The key finding is that accounting for treatment timing increases required sample sizes. Further, DID estimators have considerably more power than standard CITS and ITS estimators. An available Shiny R dashboard performs the sample size calculations for the considered estimators.
本文为常用的差分(DID)和比较中断时间序列(CITS)面板数据估计量开发了新的幂分析闭式方差表达式。主要贡献是将治疗时间的变化纳入分析。幂公式还考虑了实践中出现的其他关键设计特征:自相关误差、不相等的测量间隔以及由于治疗分配单位而导致的聚类。我们考虑横截面和纵向模型的幂公式,并考虑协变。说明性的功率分析提供了关于适当样本大小的指导。关键发现是,考虑到治疗时间会增加所需的样本量。此外,DID估计器比标准CITS和ITS估计器具有更大的功率。一个可用的Shiny R仪表板为所考虑的估计量执行样本量计算。
{"title":"Statistical Power for Estimating Treatment Effects Using Difference-in-Differences and Comparative Interrupted Time Series Estimators With Variation in Treatment Timing","authors":"Peter Z. Schochet","doi":"10.3102/10769986211070625","DOIUrl":"https://doi.org/10.3102/10769986211070625","url":null,"abstract":"This article develops new closed-form variance expressions for power analyses for commonly used difference-in-differences (DID) and comparative interrupted time series (CITS) panel data estimators. The main contribution is to incorporate variation in treatment timing into the analysis. The power formulas also account for other key design features that arise in practice: autocorrelated errors, unequal measurement intervals, and clustering due to the unit of treatment assignment. We consider power formulas for both cross-sectional and longitudinal models and allow for covariates. An illustrative power analysis provides guidance on appropriate sample sizes. The key finding is that accounting for treatment timing increases required sample sizes. Further, DID estimators have considerably more power than standard CITS and ITS estimators. An available Shiny R dashboard performs the sample size calculations for the considered estimators.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"47 1","pages":"367 - 405"},"PeriodicalIF":2.4,"publicationDate":"2021-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41400358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
Journal of Educational and Behavioral Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1