首页 > 最新文献

Journal of Educational and Behavioral Statistics最新文献

英文 中文
Reporting Proficiency Levels for Examinees With Incomplete Data 数据不完整的考生报告熟练程度
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-10-24 DOI: 10.3102/10769986211051379
S. Sinharay
Takers of educational tests often receive proficiency levels instead of or in addition to scaled scores. For example, proficiency levels are reported for the Advanced Placement (AP®) and U.S. Medical Licensing examinations. Technical difficulties and other unforeseen events occasionally lead to missing item scores and hence to incomplete data on these tests. The reporting of proficiency levels to the examinees with incomplete data requires estimation of the performance of the examinees on the missing part and essentially involves imputation of missing data. In this article, six approaches from the literature on missing data analysis are brought to bear on the problem of reporting of proficiency levels to the examinees with incomplete data. Data from several large-scale educational tests are used to compare the performances of the six approaches to the approach that is operationally used for reporting proficiency levels for these tests. A multiple imputation approach based on chained equations is shown to lead to the most accurate reporting of proficiency levels for data that were missing at random or completely at random, while the model-based approach of Holman and Glas performed the best for data that are missing not at random. Several recommendations are made on the reporting of proficiency levels to the examinees with incomplete data.
参加教育考试的人通常会获得熟练程度,而不是按比例计算的分数。例如,高级入学考试(AP®)和美国医学执照考试的熟练程度报告。技术困难和其他不可预见的事件偶尔会导致项目分数缺失,从而导致这些测试的数据不完整。向数据不完整的考生报告熟练程度需要估计考生在缺失部分的表现,本质上涉及缺失数据的插补。在本文中,从文献中关于缺失数据分析的六种方法来解决向数据不完整的考生报告熟练程度的问题。使用来自几次大规模教育测试的数据,将六种方法的性能与用于报告这些测试熟练程度的方法进行比较。基于链式方程的多重插补方法被证明可以最准确地报告随机或完全随机缺失的数据的熟练程度,而Holman和Glas的基于模型的方法对非随机丢失的数据表现最好。就向数据不完整的考生报告熟练程度提出了几项建议。
{"title":"Reporting Proficiency Levels for Examinees With Incomplete Data","authors":"S. Sinharay","doi":"10.3102/10769986211051379","DOIUrl":"https://doi.org/10.3102/10769986211051379","url":null,"abstract":"Takers of educational tests often receive proficiency levels instead of or in addition to scaled scores. For example, proficiency levels are reported for the Advanced Placement (AP®) and U.S. Medical Licensing examinations. Technical difficulties and other unforeseen events occasionally lead to missing item scores and hence to incomplete data on these tests. The reporting of proficiency levels to the examinees with incomplete data requires estimation of the performance of the examinees on the missing part and essentially involves imputation of missing data. In this article, six approaches from the literature on missing data analysis are brought to bear on the problem of reporting of proficiency levels to the examinees with incomplete data. Data from several large-scale educational tests are used to compare the performances of the six approaches to the approach that is operationally used for reporting proficiency levels for these tests. A multiple imputation approach based on chained equations is shown to lead to the most accurate reporting of proficiency levels for data that were missing at random or completely at random, while the model-based approach of Holman and Glas performed the best for data that are missing not at random. Several recommendations are made on the reporting of proficiency levels to the examinees with incomplete data.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"47 1","pages":"263 - 296"},"PeriodicalIF":2.4,"publicationDate":"2021-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43010884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Comparison of Within- and Between-Series Effect Estimates in the Meta-Analysis of Multiple Baseline Studies 多个基线研究荟萃分析中系列内和系列间效应估计的比较
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-08-05 DOI: 10.3102/10769986211035507
Seang-Hwane Joo, Yan Wang, J. Ferron, S. N. Beretvas, Mariola Moeyaert, W. Van den Noortgate
Multiple baseline (MB) designs are becoming more prevalent in educational and behavioral research, and as they do, there is growing interest in combining effect size estimates across studies. To further refine the meta-analytic methods of estimating the effect, this study developed and compared eight alternative methods of estimating intervention effects from a set of MB studies. The methods differed in the assumptions made and varied in whether they relied on within- or between-series comparisons, modeled raw data or effect sizes, and did or did not standardize. Small sample functioning was examined through two simulation studies, which showed that when data were consistent with assumptions the bias was consistently less than 5% of the effect size for each method, whereas root mean squared error varied substantially across methods. When assumptions were violated, substantial biases were found. Implications and limitations are discussed.
多重基线(MB)设计在教育和行为研究中变得越来越普遍,随着它们的出现,人们对跨研究组合效应大小估计的兴趣越来越大。为了进一步完善评估效果的荟萃分析方法,本研究开发并比较了一组MB研究中评估干预效果的八种替代方法。这些方法的不同之处在于所做的假设,以及它们是否依赖于序列内或序列间比较、建模原始数据或效应大小,以及是否标准化。通过两项模拟研究检验了小样本功能,结果表明,当数据与假设一致时,每种方法的偏差始终小于效应大小的5%,而不同方法的均方根误差差异很大。当假设被违背时,就会发现大量的偏差。讨论了影响和局限性。
{"title":"Comparison of Within- and Between-Series Effect Estimates in the Meta-Analysis of Multiple Baseline Studies","authors":"Seang-Hwane Joo, Yan Wang, J. Ferron, S. N. Beretvas, Mariola Moeyaert, W. Van den Noortgate","doi":"10.3102/10769986211035507","DOIUrl":"https://doi.org/10.3102/10769986211035507","url":null,"abstract":"Multiple baseline (MB) designs are becoming more prevalent in educational and behavioral research, and as they do, there is growing interest in combining effect size estimates across studies. To further refine the meta-analytic methods of estimating the effect, this study developed and compared eight alternative methods of estimating intervention effects from a set of MB studies. The methods differed in the assumptions made and varied in whether they relied on within- or between-series comparisons, modeled raw data or effect sizes, and did or did not standardize. Small sample functioning was examined through two simulation studies, which showed that when data were consistent with assumptions the bias was consistently less than 5% of the effect size for each method, whereas root mean squared error varied substantially across methods. When assumptions were violated, substantial biases were found. Implications and limitations are discussed.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"47 1","pages":"131 - 166"},"PeriodicalIF":2.4,"publicationDate":"2021-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48923433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Analyzing Cross-Sectionally Clustered Data Using Generalized Estimating Equations 用广义估计方程分析截面聚类数据
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-06-04 DOI: 10.3102/10769986211017480
Francis L. Huang
The presence of clustered data is common in the sociobehavioral sciences. One approach that specifically deals with clustered data but has seen little use in education is the generalized estimating equations (GEEs) approach. We provide a background on GEEs, discuss why it is appropriate for the analysis of clustered data, and provide worked examples using both continuous and binary outcomes. Comparisons are made between GEEs, multilevel models, and ordinary least squares results to highlight similarities and differences between the approaches. Detailed walkthroughs are provided using both R and SPSS Version 26.
聚类数据的存在在社会行为科学中很常见。一种专门处理聚类数据但在教育中很少使用的方法是广义估计方程(GEEs)方法。我们提供了GEE的背景,讨论了为什么它适合分析聚类数据,并提供了使用连续结果和二元结果的实例。对GEE、多级模型和普通最小二乘法结果进行了比较,以突出两种方法之间的异同。使用R和SPSS Version 26提供了详细的演练。
{"title":"Analyzing Cross-Sectionally Clustered Data Using Generalized Estimating Equations","authors":"Francis L. Huang","doi":"10.3102/10769986211017480","DOIUrl":"https://doi.org/10.3102/10769986211017480","url":null,"abstract":"The presence of clustered data is common in the sociobehavioral sciences. One approach that specifically deals with clustered data but has seen little use in education is the generalized estimating equations (GEEs) approach. We provide a background on GEEs, discuss why it is appropriate for the analysis of clustered data, and provide worked examples using both continuous and binary outcomes. Comparisons are made between GEEs, multilevel models, and ordinary least squares results to highlight similarities and differences between the approaches. Detailed walkthroughs are provided using both R and SPSS Version 26.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"47 1","pages":"101 - 125"},"PeriodicalIF":2.4,"publicationDate":"2021-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43238549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Using Sequence Mining Techniques for Understanding Incorrect Behavioral Patterns on Interactive Tasks 使用序列挖掘技术理解交互任务中的错误行为模式
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-05-03 DOI: 10.3102/10769986211010467
Esther Ulitzsch, Qiwei He, S. Pohl
Interactive tasks designed to elicit real-life problem-solving behavior are rapidly becoming more widely used in educational assessment. Incorrect responses to such tasks can occur for a variety of different reasons such as low proficiency levels, low metacognitive strategies, or motivational issues. We demonstrate how behavioral patterns associated with incorrect responses can, in part, be understood, supporting insights into the different sources of failure on a task. To this end, we make use of sequence mining techniques that leverage the information contained in time-stamped action sequences commonly logged in assessments with interactive tasks for (a) investigating what distinguishes incorrect behavioral patterns from correct ones and (b) identifying subgroups of examinees with similar incorrect behavioral patterns. Analyzing a task from the Programme for the International Assessment of Adult Competencies 2012 assessment, we find incorrect behavioral patterns to be more heterogeneous than correct ones. We identify multiple subgroups of incorrect behavioral patterns, which point toward different levels of effort and lack of different subskills needed for solving the task. Albeit focusing on a single task, meaningful patterns of major differences in how examinees approach a given task that generalize across multiple tasks are uncovered. Implications for the construction and analysis of interactive tasks as well as the design of interventions for complex problem-solving skills are derived.
旨在引发现实生活中解决问题行为的互动任务在教育评估中的应用越来越广泛。对这类任务的错误反应可能是由于各种不同的原因,如熟练程度低、元认知策略低或动机问题。我们展示了如何在一定程度上理解与错误反应相关的行为模式,支持深入了解任务中失败的不同来源。为此,我们利用序列挖掘技术,利用带有时间戳的动作序列中包含的信息,这些信息通常与交互式任务一起记录在评估中,用于(a)调查错误行为模式与正确行为模式的区别,以及(b)识别具有类似错误行为模式的受试者亚组。通过分析2012年国际成人能力评估计划评估的一项任务,我们发现不正确的行为模式比正确的更具异质性。我们确定了多个不正确行为模式的亚组,这些行为模式指向不同的努力水平,并且缺乏解决任务所需的不同子技能。尽管只关注一项任务,但考生处理给定任务的方式存在重大差异,这些差异在多个任务中普遍存在。得出了对交互式任务的构建和分析以及复杂问题解决技能干预措施的设计的启示。
{"title":"Using Sequence Mining Techniques for Understanding Incorrect Behavioral Patterns on Interactive Tasks","authors":"Esther Ulitzsch, Qiwei He, S. Pohl","doi":"10.3102/10769986211010467","DOIUrl":"https://doi.org/10.3102/10769986211010467","url":null,"abstract":"Interactive tasks designed to elicit real-life problem-solving behavior are rapidly becoming more widely used in educational assessment. Incorrect responses to such tasks can occur for a variety of different reasons such as low proficiency levels, low metacognitive strategies, or motivational issues. We demonstrate how behavioral patterns associated with incorrect responses can, in part, be understood, supporting insights into the different sources of failure on a task. To this end, we make use of sequence mining techniques that leverage the information contained in time-stamped action sequences commonly logged in assessments with interactive tasks for (a) investigating what distinguishes incorrect behavioral patterns from correct ones and (b) identifying subgroups of examinees with similar incorrect behavioral patterns. Analyzing a task from the Programme for the International Assessment of Adult Competencies 2012 assessment, we find incorrect behavioral patterns to be more heterogeneous than correct ones. We identify multiple subgroups of incorrect behavioral patterns, which point toward different levels of effort and lack of different subskills needed for solving the task. Albeit focusing on a single task, meaningful patterns of major differences in how examinees approach a given task that generalize across multiple tasks are uncovered. Implications for the construction and analysis of interactive tasks as well as the design of interventions for complex problem-solving skills are derived.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"47 1","pages":"3 - 35"},"PeriodicalIF":2.4,"publicationDate":"2021-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41989802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
A Case Study of Nonresponse Bias Analysis in Educational Assessment Surveys 教育评估调查中的无应答偏差分析案例研究
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-04-09 DOI: 10.3102/10769986221141074
Yajuan Si, R. Little, Ya Mo, N. Sedransk
Nonresponse bias is a widely prevalent problem for data on education. We develop a ten-step exemplar to guide nonresponse bias analysis (NRBA) in cross-sectional studies and apply these steps to the Early Childhood Longitudinal Study, Kindergarten Class of 2010–2011. A key step is the construction of indices of nonresponse bias based on proxy pattern-mixture models for survey variables of interest. A novel feature is to characterize the strength of evidence about nonresponse bias contained in these indices, based on the strength of the relationship between the characteristics in the nonresponse adjustment and the key survey variables. Our NRBA improves the existing methods by incorporating both missing at random and missing not at random mechanisms, and all analyses can be done straightforwardly with standard statistical software.
对于教育数据而言,不回应偏见是一个普遍存在的问题。我们开发了一个十步样本来指导横断面研究中的无反应偏倚分析(NRBA),并将这些步骤应用于2010-2011幼儿园的幼儿纵向研究。一个关键步骤是基于感兴趣的调查变量的代理模式混合模型构建无反应偏差指数。一个新的特征是,根据无反应调整的特征和关键调查变量之间的关系强度,来表征这些指数中包含的无反应偏差的证据强度。我们的NRBA通过结合随机缺失和非随机缺失机制改进了现有方法,所有分析都可以直接使用标准统计软件进行。
{"title":"A Case Study of Nonresponse Bias Analysis in Educational Assessment Surveys","authors":"Yajuan Si, R. Little, Ya Mo, N. Sedransk","doi":"10.3102/10769986221141074","DOIUrl":"https://doi.org/10.3102/10769986221141074","url":null,"abstract":"Nonresponse bias is a widely prevalent problem for data on education. We develop a ten-step exemplar to guide nonresponse bias analysis (NRBA) in cross-sectional studies and apply these steps to the Early Childhood Longitudinal Study, Kindergarten Class of 2010–2011. A key step is the construction of indices of nonresponse bias based on proxy pattern-mixture models for survey variables of interest. A novel feature is to characterize the strength of evidence about nonresponse bias contained in these indices, based on the strength of the relationship between the characteristics in the nonresponse adjustment and the key survey variables. Our NRBA improves the existing methods by incorporating both missing at random and missing not at random mechanisms, and all analyses can be done straightforwardly with standard statistical software.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"48 1","pages":"271 - 295"},"PeriodicalIF":2.4,"publicationDate":"2021-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46105939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Validation Methods for Aggregate-Level Test Scale Linking: A Rejoinder 聚合级测试量表链接的验证方法:一个反驳
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-04-01 DOI: 10.3102/1076998621994540
Andrew D. Ho, Sean F. Reardon, Demetra Kalogrides
In this issue, Reardon, Kalogrides, and Ho developed precision-adjusted random effects models to estimate aggregate-level linking error, for populations and subpopulations, for averages and progress over time. We are grateful to past editor Dan McCaffrey for selecting our paper as the focal article for a set of commentaries from our colleagues Daniel Bolt, Mark Davison, Alina von Davier, Tim Moses, and Neil Dorans. These commentaries reinforce important cautions and identify promising directions for future research. In this rejoinder, we clarify aspects of our originally proposed method. (1) Validation methods provide evidence of benefits and risks that different experts may weigh differently for different purposes. (2) Our proposed method differs from “standard mapping” procedures using the National Assessment of Educational Progress not only by using a linear (vs. equipercentile) link but also by targeting direct validity evidence about counterfactual aggregate scores. (3) Multilevel approaches that assume common score scales across states are indeed a promising next step for validation, and we hope that states enable researchers to use more of their common-core-era consortium test data for this purpose. Finally, we apply our linking method to an extended panel of data from 2009 to 2017 to show that linking recovery has remained stable.
在本期中,Reardon、Kalogrides和Ho开发了精度调整的随机效应模型,以估计种群和亚种群的总体水平连接误差,以及随时间的平均值和进展。我们感谢前任编辑Dan McCaffrey选择我们的论文作为我们同事Daniel Bolt、Mark Davison、Alina von Davier、Tim Moses和Neil Dorans的一系列评论的焦点文章。这些评论加强了重要的注意事项,并为未来的研究指明了有希望的方向。在这篇反驳中,我们澄清了我们最初提出的方法的各个方面。(1) 验证方法提供了利益和风险的证据,不同的专家可能会出于不同的目的对其进行不同的权衡。(2) 我们提出的方法与使用国家教育进步评估的“标准映射”程序的不同之处不仅在于使用线性(与等百分比)链接,还在于针对反事实总分的直接有效性证据。(3) 假设各州的评分标准相同的多层次方法确实是下一步验证的好方法,我们希望各州能够让研究人员为此目的使用更多的共同核心时代联盟测试数据。最后,我们将我们的链接方法应用于2009年至2017年的一组扩展数据,以表明链接恢复保持稳定。
{"title":"Validation Methods for Aggregate-Level Test Scale Linking: A Rejoinder","authors":"Andrew D. Ho, Sean F. Reardon, Demetra Kalogrides","doi":"10.3102/1076998621994540","DOIUrl":"https://doi.org/10.3102/1076998621994540","url":null,"abstract":"In this issue, Reardon, Kalogrides, and Ho developed precision-adjusted random effects models to estimate aggregate-level linking error, for populations and subpopulations, for averages and progress over time. We are grateful to past editor Dan McCaffrey for selecting our paper as the focal article for a set of commentaries from our colleagues Daniel Bolt, Mark Davison, Alina von Davier, Tim Moses, and Neil Dorans. These commentaries reinforce important cautions and identify promising directions for future research. In this rejoinder, we clarify aspects of our originally proposed method. (1) Validation methods provide evidence of benefits and risks that different experts may weigh differently for different purposes. (2) Our proposed method differs from “standard mapping” procedures using the National Assessment of Educational Progress not only by using a linear (vs. equipercentile) link but also by targeting direct validity evidence about counterfactual aggregate scores. (3) Multilevel approaches that assume common score scales across states are indeed a promising next step for validation, and we hope that states enable researchers to use more of their common-core-era consortium test data for this purpose. Finally, we apply our linking method to an extended panel of data from 2009 to 2017 to show that linking recovery has remained stable.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"46 1","pages":"209 - 218"},"PeriodicalIF":2.4,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49049001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Introduction to JEBS Special Issue on NAEP Linked Aggregate Scores 关于NAEP关联总分的JEBS特刊简介
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-04-01 DOI: 10.3102/10769986211001480
D. McCaffrey, S. Culpepper
The Stanford Education Data Archive (SEDA) was created by Sean Reardon, Andrew Ho, Demetra Kalogrides, and their colleagues using annual state summative test score data retrieved from the EDFacts Restricted-Use Files and publicly available NAEP data from the National Center for Education Statistics. SEDA provides test score data on a common scale across all states for mathematics and reading language arts for students in Grades 3 through 8 for almost all schools, districts, and counties in the United States. An online tool (edopportu nity.org) allows users to visually compare schools and districts from anywhere in the country. Data also include various covariates at each of these levels, and all the data can be downloaded for free for analysis. These data have the potential to be a very valuable resource for researchers, educators, policy makers, and possibly even the general public. The catch is that there is no common standardized test administered to students in Grades 3 through 8 in all schools and school districts in all states. NAEP is only administered in a relatively small sample of schools in each state and only to students in Grades 4 and 8 and only every other year. The school data in SEDA are derived from the annual tests administered by each state in accordance with federal regulations. Reardon, Ho, Kalogrides, and colleagues start with aggregate data of the numbers of students in each school or district meeting various performance levels on their state standardized tests. State tests are on different scales and test somewhat different content. They also use different cutoffs for performance levels that are not common across states. Reardon, Ho, Kalogrides, and colleagues convert these frequencies to means and standard deviations for the scores in each school or district using the Heteroskedastic Ordered Probit model that was developed into a series of papers in JEBS (Lockwood et al., 2018; Reardon et al., 2017; Shear & Reardon, 2021). They then link these means and standard deviations to the NAEP scale using methods described in Reardon et al. (2021). Reardon, Ho, Kalogrides, and colleagues stitched together a collection of methods to create a national data source of Journal of Educational and Behavioral Statistics 2021, Vol. 46, No. 2, pp. 135–137 DOI: 10.3102/10769986211001480 Article reuse guidelines: sagepub.com/journals-permissions © 2021 AERA. https://journals.sagepub.com/home/jeb
斯坦福教育数据档案(SEDA)是由Sean Reardon、Andrew Ho、Demetra Kalogrides和他们的同事创建的,他们使用从EDFacts限制使用文件中检索的年度州总结性考试成绩数据和国家教育统计中心公开的NAEP数据。SEDA为美国几乎所有学校、地区和县的3至8年级学生提供通用规模的数学和阅读语言艺术考试成绩数据。一个在线工具(edopportunities nity.org)可以让用户直观地比较全国各地的学校和学区。数据还包括这些级别上的各种协变量,所有数据都可以免费下载以供分析。这些数据有可能成为研究人员、教育工作者、政策制定者,甚至可能是公众非常宝贵的资源。问题是,在所有州的所有学校和学区,没有针对三年级到八年级学生的统一标准化考试。NAEP只在每个州相对较小的学校样本中实施,只针对四年级和八年级的学生,而且每隔一年才实施一次。SEDA中的学校数据来自各州根据联邦法规进行的年度测试。Reardon、Ho、Kalogrides和同事们从每个学校或学区在州标准化考试中达到不同表现水平的学生人数的汇总数据开始。各州考试的规模不同,测试的内容也有所不同。他们还使用不同的绩效水平截止值,这在各州并不常见。Reardon、Ho、Kalogrides和同事使用异方差有序概率模型(Heteroskedastic Ordered Probit model)将这些频率转换为每个学校或学区分数的均值和标准差,该模型已在JEBS上发表了一系列论文(Lockwood等人,2018;Reardon et al., 2017;Shear & Reardon, 2021)。然后,他们使用Reardon等人(2021)中描述的方法将这些均值和标准差与NAEP量表联系起来。Reardon, Ho, Kalogrides和同事们将一系列方法结合在一起,创建了《教育与行为统计杂志》2021年第46卷第2期135-137页的国家数据源DOI: 10.3102/10769986211001480文章重用指南:sagepub.com/journals-permissions©2021 AERA。https://journals.sagepub.com/home/jeb
{"title":"Introduction to JEBS Special Issue on NAEP Linked Aggregate Scores","authors":"D. McCaffrey, S. Culpepper","doi":"10.3102/10769986211001480","DOIUrl":"https://doi.org/10.3102/10769986211001480","url":null,"abstract":"The Stanford Education Data Archive (SEDA) was created by Sean Reardon, Andrew Ho, Demetra Kalogrides, and their colleagues using annual state summative test score data retrieved from the EDFacts Restricted-Use Files and publicly available NAEP data from the National Center for Education Statistics. SEDA provides test score data on a common scale across all states for mathematics and reading language arts for students in Grades 3 through 8 for almost all schools, districts, and counties in the United States. An online tool (edopportu nity.org) allows users to visually compare schools and districts from anywhere in the country. Data also include various covariates at each of these levels, and all the data can be downloaded for free for analysis. These data have the potential to be a very valuable resource for researchers, educators, policy makers, and possibly even the general public. The catch is that there is no common standardized test administered to students in Grades 3 through 8 in all schools and school districts in all states. NAEP is only administered in a relatively small sample of schools in each state and only to students in Grades 4 and 8 and only every other year. The school data in SEDA are derived from the annual tests administered by each state in accordance with federal regulations. Reardon, Ho, Kalogrides, and colleagues start with aggregate data of the numbers of students in each school or district meeting various performance levels on their state standardized tests. State tests are on different scales and test somewhat different content. They also use different cutoffs for performance levels that are not common across states. Reardon, Ho, Kalogrides, and colleagues convert these frequencies to means and standard deviations for the scores in each school or district using the Heteroskedastic Ordered Probit model that was developed into a series of papers in JEBS (Lockwood et al., 2018; Reardon et al., 2017; Shear & Reardon, 2021). They then link these means and standard deviations to the NAEP scale using methods described in Reardon et al. (2021). Reardon, Ho, Kalogrides, and colleagues stitched together a collection of methods to create a national data source of Journal of Educational and Behavioral Statistics 2021, Vol. 46, No. 2, pp. 135–137 DOI: 10.3102/10769986211001480 Article reuse guidelines: sagepub.com/journals-permissions © 2021 AERA. https://journals.sagepub.com/home/jeb","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"46 1","pages":"135 - 137"},"PeriodicalIF":2.4,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46561773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Rating Scale Mixture Model to Account for the Tendency to Middle and Extreme Categories 考虑中等和极端类别倾向的评级量表混合模型
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-03-31 DOI: 10.3102/1076998621992554
R. Colombi, S. Giordano, G. Tutz
A mixture of logit models is proposed that discriminates between responses to rating questions that are affected by a tendency to prefer middle or extremes of the scale regardless of the content of the item (response styles) and purely content-driven preferences. Explanatory variables are used to characterize the content-driven way of answering as well as the tendency to middle or extreme categories. The proposed model is extended to account for the presence of response styles in the case of several items, and the association among responses is described, both when they are content driven or dictated by response styles. In addition, stochastic orderings, related to the tendency to select middle or extreme categories, are introduced and investigated. A simulation study describes the effectiveness of the proposed model, and an application to a questionnaire on attitudes toward ethnic minorities illustrates the applicability of the modeling approach.
我们提出了一种混合的logit模型,用于区分对评分问题的回答,这些问题受不考虑项目内容(回答风格)而倾向于选择量表的中间或极端的倾向的影响,以及纯粹的内容驱动的偏好。解释变量用于描述内容驱动的回答方式,以及趋向于中间或极端类别。所建议的模型被扩展为在多个项目的情况下考虑响应风格的存在,并且描述了响应之间的关联,无论是内容驱动的还是由响应风格决定的。此外,引入并研究了与选择中间或极端类别的倾向有关的随机排序。仿真研究表明了该模型的有效性,并通过对少数民族态度问卷调查的应用说明了该模型方法的适用性。
{"title":"A Rating Scale Mixture Model to Account for the Tendency to Middle and Extreme Categories","authors":"R. Colombi, S. Giordano, G. Tutz","doi":"10.3102/1076998621992554","DOIUrl":"https://doi.org/10.3102/1076998621992554","url":null,"abstract":"A mixture of logit models is proposed that discriminates between responses to rating questions that are affected by a tendency to prefer middle or extremes of the scale regardless of the content of the item (response styles) and purely content-driven preferences. Explanatory variables are used to characterize the content-driven way of answering as well as the tendency to middle or extreme categories. The proposed model is extended to account for the presence of response styles in the case of several items, and the association among responses is described, both when they are content driven or dictated by response styles. In addition, stochastic orderings, related to the tendency to select middle or extreme categories, are introduced and investigated. A simulation study describes the effectiveness of the proposed model, and an application to a questionnaire on attitudes toward ethnic minorities illustrates the applicability of the modeling approach.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"46 1","pages":"682 - 716"},"PeriodicalIF":2.4,"publicationDate":"2021-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45838221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Detecting Noneffortful Responses Based on a Residual Method Using an Iterative Purification Process 基于残差法的迭代纯化过程非努力响应检测
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-03-29 DOI: 10.3102/1076998621994366
Yue Liu, Hongyun Liu
The prevalence and serious consequences of noneffortful responses from unmotivated examinees are well-known in educational measurement. In this study, we propose to apply an iterative purification process based on a response time residual method with fixed item parameter estimates to detect noneffortful responses. The proposed method is compared with the traditional residual method and noniterative method with fixed item parameters in two simulation studies in terms of noneffort detection accuracy and parameter recovery. The results show that when severity of noneffort is high, the proposed method leads to a much higher true positive rate with a small increase of false discovery rate. In addition, parameter estimation is significantly improved by the strategies of fixing item parameters and iteratively cleansing. These results suggest that the proposed method is a potential solution to reduce the impact of data contamination due to severe low test-taking effort and to obtain more accurate parameter estimates. An empirical study is also conducted to show the differences in the detection rate and parameter estimates among different approaches.
在教育测量中,无动机考生的不努力反应的普遍性和严重后果是众所周知的。在这项研究中,我们提出了一种基于响应时间残差法的迭代净化过程,该方法具有固定的项目参数估计,以检测不费力的响应。通过两次仿真研究,将该方法与传统残差法和固定项目参数的非迭代法在检测精度和参数恢复方面进行了比较。结果表明,当不费力的严重程度较高时,该方法的真阳性率要高得多,而错误发现率的增加幅度较小。此外,采用固定项目参数和迭代清理策略,显著改善了参数估计。这些结果表明,所提出的方法是一种潜在的解决方案,可以减少由于严重的低测试工作量而造成的数据污染的影响,并获得更准确的参数估计。实证研究也显示了不同方法在检出率和参数估计上的差异。
{"title":"Detecting Noneffortful Responses Based on a Residual Method Using an Iterative Purification Process","authors":"Yue Liu, Hongyun Liu","doi":"10.3102/1076998621994366","DOIUrl":"https://doi.org/10.3102/1076998621994366","url":null,"abstract":"The prevalence and serious consequences of noneffortful responses from unmotivated examinees are well-known in educational measurement. In this study, we propose to apply an iterative purification process based on a response time residual method with fixed item parameter estimates to detect noneffortful responses. The proposed method is compared with the traditional residual method and noniterative method with fixed item parameters in two simulation studies in terms of noneffort detection accuracy and parameter recovery. The results show that when severity of noneffort is high, the proposed method leads to a much higher true positive rate with a small increase of false discovery rate. In addition, parameter estimation is significantly improved by the strategies of fixing item parameters and iteratively cleansing. These results suggest that the proposed method is a potential solution to reduce the impact of data contamination due to severe low test-taking effort and to obtain more accurate parameter estimates. An empirical study is also conducted to show the differences in the detection rate and parameter estimates among different approaches.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"46 1","pages":"717 - 752"},"PeriodicalIF":2.4,"publicationDate":"2021-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44643360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Item Characteristic Curve Asymmetry: A Better Way to Accommodate Slips and Guesses Than a Four-Parameter Model? 项目特征曲线不对称:比四参数模型更好地适应滑移和猜测?
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-03-29 DOI: 10.3102/10769986211003283
Xiangyi Liao, D. Bolt
Four-parameter models have received increasing psychometric attention in recent years, as a reduced upper asymptote for item characteristic curves can be appealing for measurement applications such as adaptive testing and person-fit assessment. However, applications can be challenging due to the large number of parameters in the model. In this article, we demonstrate in the context of mathematics assessments how the slip and guess parameters of a four-parameter model may often be empirically related. This observation also has a psychological explanation to the extent that both asymptote parameters may be manifestations of a single item complexity characteristic. The relationship between lower and upper asymptotes motivates the consideration of an asymmetric item response theory model as a three-parameter alternative to the four-parameter model. Using actual response data from mathematics multiple-choice tests, we demonstrate the empirical superiority of a three-parameter asymmetric model in several standardized tests of mathematics. To the extent that a model of asymmetry ultimately portrays slips and guesses not as purely random but rather as proficiency-related phenomena, we argue that the asymmetric approach may also have greater psychological plausibility.
近年来,四参数模型越来越受到心理测量学的关注,因为项目特征曲线的上渐近线减少可以吸引测量应用,如自适应测试和人适合评估。然而,由于模型中有大量参数,应用程序可能具有挑战性。在本文中,我们在数学评估的背景下证明了四参数模型的滑移和猜测参数如何经常与经验相关。这一观察结果也有心理学上的解释,在某种程度上,两个渐近线参数可能是单个项目复杂性特征的表现。上下渐近线之间的关系激发了不对称项目反应理论模型作为四参数模型的三参数替代方案的考虑。利用数学多项选择题的实际回答数据,我们证明了三参数不对称模型在若干数学标准化测试中的经验优势。在某种程度上,不对称模型最终将失误和猜测描述为不是纯粹随机的,而是与熟练程度相关的现象,我们认为不对称方法也可能具有更大的心理学合理性。
{"title":"Item Characteristic Curve Asymmetry: A Better Way to Accommodate Slips and Guesses Than a Four-Parameter Model?","authors":"Xiangyi Liao, D. Bolt","doi":"10.3102/10769986211003283","DOIUrl":"https://doi.org/10.3102/10769986211003283","url":null,"abstract":"Four-parameter models have received increasing psychometric attention in recent years, as a reduced upper asymptote for item characteristic curves can be appealing for measurement applications such as adaptive testing and person-fit assessment. However, applications can be challenging due to the large number of parameters in the model. In this article, we demonstrate in the context of mathematics assessments how the slip and guess parameters of a four-parameter model may often be empirically related. This observation also has a psychological explanation to the extent that both asymptote parameters may be manifestations of a single item complexity characteristic. The relationship between lower and upper asymptotes motivates the consideration of an asymmetric item response theory model as a three-parameter alternative to the four-parameter model. Using actual response data from mathematics multiple-choice tests, we demonstrate the empirical superiority of a three-parameter asymmetric model in several standardized tests of mathematics. To the extent that a model of asymmetry ultimately portrays slips and guesses not as purely random but rather as proficiency-related phenomena, we argue that the asymmetric approach may also have greater psychological plausibility.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"46 1","pages":"753 - 775"},"PeriodicalIF":2.4,"publicationDate":"2021-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49036230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
Journal of Educational and Behavioral Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1