首页 > 最新文献

Journal of Educational and Behavioral Statistics最新文献

英文 中文
The Rank-2PL IRT Models for Forced-Choice Questionnaires: Maximum Marginal Likelihood Estimation with an EM Algorithm. 强迫选择问卷的Rank-2PL IRT模型:基于EM算法的最大边际似然估计。
IF 1.7 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2025-06-01 Epub Date: 2024-06-18 DOI: 10.3102/10769986241256030
Jianbin Fu, Xuan Tan, Patrick C Kyllonen
{"title":"The Rank-2PL IRT Models for Forced-Choice Questionnaires: Maximum Marginal Likelihood Estimation with an EM Algorithm.","authors":"Jianbin Fu, Xuan Tan, Patrick C Kyllonen","doi":"10.3102/10769986241256030","DOIUrl":"https://doi.org/10.3102/10769986241256030","url":null,"abstract":"","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"50 3","pages":"497-525"},"PeriodicalIF":1.7,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12379955/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144974359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Approaches to Statistical Efficiency When Comparing the Embedded Adaptive Interventions in a SMART. 比较SMART中嵌入式自适应干预时的统计效率方法。
IF 1.7 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2025-06-01 Epub Date: 2024-05-28 DOI: 10.3102/10769986241251419
Timothy Lycurgus, Amy Kilbourne, Daniel Almirall

Sequential, multiple assignment randomized trials (SMARTs), which assist in the optimization of adaptive interventions, are growing in popularity in education and behavioral sciences. This is unsurprising, as adaptive interventions reflect the sequential, tailored nature of learning in a classroom or school. Nonetheless, as is true elsewhere in education research, observed effect sizes in education-based SMARTs are frequently small. As a consequence, statistical efficiency is of paramount importance in their analysis. The contributions of this manuscript are twofold. First, we provide an overview of adaptive interventions and SMART designs for researchers in education science. Second, we propose four techniques that have the potential to improve statistical efficiency in the analysis of SMARTs. We demonstrate the benefits of these techniques in SMART settings both through the analysis of a SMART designed to optimize an adaptive intervention for increasing cognitive behavioral therapy delivery in school settings and through a comprehensive simulation study. Each of the proposed techniques is easily implementable, either with over-the-counter statistical software or through R code provided in Supplemental Material.

连续、多任务随机试验(SMARTs)有助于优化适应性干预措施,在教育和行为科学领域越来越受欢迎。这并不奇怪,因为适应性干预反映了课堂或学校学习的顺序性和定制性。然而,正如其他教育研究一样,在基于教育的智能中观察到的效应量通常很小。因此,统计效率在他们的分析中是至关重要的。这份手稿的贡献是双重的。首先,我们为教育科学研究人员提供了适应性干预和SMART设计的概述。其次,我们提出了四种有可能提高smart分析统计效率的技术。通过对SMART的分析,我们证明了这些技术在SMART环境中的好处,SMART旨在优化适应性干预,以增加学校环境中的认知行为治疗,并通过全面的模拟研究。提出的每一种技术都很容易实现,要么使用非处方统计软件,要么通过补充材料中提供的R代码。
{"title":"Approaches to Statistical Efficiency When Comparing the Embedded Adaptive Interventions in a SMART.","authors":"Timothy Lycurgus, Amy Kilbourne, Daniel Almirall","doi":"10.3102/10769986241251419","DOIUrl":"10.3102/10769986241251419","url":null,"abstract":"<p><p>Sequential, multiple assignment randomized trials (SMARTs), which assist in the optimization of adaptive interventions, are growing in popularity in education and behavioral sciences. This is unsurprising, as adaptive interventions reflect the sequential, tailored nature of learning in a classroom or school. Nonetheless, as is true elsewhere in education research, observed effect sizes in education-based SMARTs are frequently small. As a consequence, statistical efficiency is of paramount importance in their analysis. The contributions of this manuscript are twofold. First, we provide an overview of adaptive interventions and SMART designs for researchers in education science. Second, we propose four techniques that have the potential to improve statistical efficiency in the analysis of SMARTs. We demonstrate the benefits of these techniques in SMART settings both through the analysis of a SMART designed to optimize an adaptive intervention for increasing cognitive behavioral therapy delivery in school settings and through a comprehensive simulation study. Each of the proposed techniques is easily implementable, either with over-the-counter statistical software or through R code provided in Supplemental Material.</p>","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"50 3","pages":"420-448"},"PeriodicalIF":1.7,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12594441/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145483458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mixed-Effects Location Scale Models for Joint Modeling School Value-Added Effects on the Mean and Variance of Student Achievement. 联合建模学校增值对学生成绩均值和方差影响的混合效应区位尺度模型。
IF 1.9 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-12-01 Epub Date: 2023-11-27 DOI: 10.3102/10769986231210808
George Leckie, Richard Parker, Harvey Goldstein, Kate Tilling

School value-added models are widely applied to study, monitor, and hold schools to account for school differences in student learning. The traditional model is a mixed-effects linear regression of student current achievement on student prior achievement, background characteristics, and a school random intercept effect. The latter is referred to as the school value-added score and measures the mean student covariate-adjusted achievement in each school. In this article, we argue that further insights may be gained by additionally studying the variance in this quantity in each school. These include the ability to identify both individual schools and school types that exhibit unusually high or low variability in student achievement, even after accounting for differences in student intakes. We explore and illustrate how this can be done via fitting mixed-effects location scale versions of the traditional school value-added model. We discuss the implications of our work for research and school accountability systems.

学校增值模型被广泛应用于研究、监督和控制学校,以解释学校在学生学习方面的差异。传统的模型是学生当前成绩对学生先前成绩、背景特征和学校随机截距效应的混合效应线性回归。后者称为学校增值分数,衡量的是每所学校经协变量调整后的学生平均成绩。在本文中,我们认为,通过进一步研究每个学校的这一数量的差异,可以获得进一步的见解。这包括识别个别学校和学校类型的能力,这些学校和学校类型在学生成绩上表现出异常高或低的差异,即使在考虑了学生入学的差异之后。我们探索并说明了如何通过拟合传统学校增值模型的混合效应位置尺度版本来实现这一点。我们讨论了我们的工作对研究和学校问责制的影响。
{"title":"Mixed-Effects Location Scale Models for Joint Modeling School Value-Added Effects on the Mean and Variance of Student Achievement.","authors":"George Leckie, Richard Parker, Harvey Goldstein, Kate Tilling","doi":"10.3102/10769986231210808","DOIUrl":"10.3102/10769986231210808","url":null,"abstract":"<p><p>School value-added models are widely applied to study, monitor, and hold schools to account for school differences in student learning. The traditional model is a mixed-effects linear regression of student current achievement on student prior achievement, background characteristics, and a school random intercept effect. The latter is referred to as the school value-added score and measures the mean student covariate-adjusted achievement in each school. In this article, we argue that further insights may be gained by additionally studying the variance in this quantity in each school. These include the ability to identify both individual schools and school types that exhibit unusually high or low variability in student achievement, even after accounting for differences in student intakes. We explore and illustrate how this can be done via fitting mixed-effects location scale versions of the traditional school value-added model. We discuss the implications of our work for research and school accountability systems.</p>","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"49 6","pages":"879-911"},"PeriodicalIF":1.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7617570/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143811973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Balance in Educational Measurement: A Legacy of E. F. Lindquist 改善教育测量的平衡性:林奎斯特的遗产
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-01-07 DOI: 10.3102/10769986231218306
Daniel Koretz
A critically important balance in educational measurement between practical concerns and matters of technique has atrophied in recent decades, and as a result, some important issues in the field have not been adequately addressed. I start with the work of E. F. Lindquist, who exemplified the balance that is now wanting. Lindquist was arguably the most prolific developer of achievement tests in the history of the field and an accomplished statistician, but he nonetheless focused extensively on the practical limitations of testing and their implications for test development, test use, and inference. I describe the withering of this balance and discuss two pressing issues that have not been adequately addressed as a result: the lack of robustness of performance standards and score inflation. I conclude by discussing steps toward reestablishing the needed balance.
近几十年来,教育测量学在实际问题和技术问题之间失去了至关重要的平衡,因此,该领域的一些重要问题没有得到充分解决。我先从 E. F. Lindquist 的工作谈起,他是现在缺乏平衡的典范。林奎斯特可以说是该领域历史上最多产的成绩测验开发者,也是一位杰出的统计学家,但他仍然广泛关注测验的实际局限性及其对测验开发、测验使用和推断的影响。我描述了这种平衡的凋零,并讨论了因此而没有得到充分解决的两个紧迫问题:成绩标准缺乏稳健性和分数膨胀。最后,我将讨论重建必要平衡的步骤。
{"title":"Improving Balance in Educational Measurement: A Legacy of E. F. Lindquist","authors":"Daniel Koretz","doi":"10.3102/10769986231218306","DOIUrl":"https://doi.org/10.3102/10769986231218306","url":null,"abstract":"A critically important balance in educational measurement between practical concerns and matters of technique has atrophied in recent decades, and as a result, some important issues in the field have not been adequately addressed. I start with the work of E. F. Lindquist, who exemplified the balance that is now wanting. Lindquist was arguably the most prolific developer of achievement tests in the history of the field and an accomplished statistician, but he nonetheless focused extensively on the practical limitations of testing and their implications for test development, test use, and inference. I describe the withering of this balance and discuss two pressing issues that have not been adequately addressed as a result: the lack of robustness of performance standards and score inflation. I conclude by discussing steps toward reestablishing the needed balance.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"68 7","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139449121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Simple Technique Assessing Ordinal and Disordinal Interaction Effects 评估顺序和非顺序交互效应的简单技术
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-12-21 DOI: 10.3102/10769986231217472
Sang-June Park, Youjae Yi
Previous research explicates ordinal and disordinal interactions through the concept of the “crossover point.” This point is determined via simple regression models of a focal predictor at specific moderator values and signifies the intersection of these models. An interaction effect is labeled as disordinal (or ordinal) when the crossover point falls within (or outside) the observable range of the focal predictor. However, this approach might yield erroneous conclusions due to the crossover point’s intrinsic nature as a random variable defined by mean and variance. To statistically evaluate ordinal and disordinal interactions, a comparison between the observable range and the confidence interval (CI) of the crossover point is crucial. Numerous methods for establishing CIs, including reparameterization and bootstrap techniques, exist. Yet, these alternative methods are scarcely employed in social science journals for assessing ordinal and disordinal interactions. This note introduces a straightforward approach for calculating CIs, leveraging an extension of the Johnson–Neyman technique.
以往的研究通过 "交叉点 "的概念来解释顺序和非顺序的相互作用。交叉点是通过在特定调节因子值下的焦点预测因子的简单回归模型确定的,它标志着这些模型的交叉点。当交叉点位于焦点预测因子的可观测范围之内(或之外)时,交互作用效应就会被标记为不和谐(或顺序)。然而,由于交叉点是由均值和方差定义的随机变量,因此这种方法可能会得出错误的结论。要对顺序和非顺序交互作用进行统计评估,交叉点的可观测范围和置信区间(CI)之间的比较至关重要。建立置信区间的方法有很多,包括重参数化和引导技术。然而,在社会科学期刊中,这些替代方法很少被用于评估顺序和非顺序交互作用。本说明介绍了一种利用约翰逊-奈曼技术扩展计算 CI 的直接方法。
{"title":"A Simple Technique Assessing Ordinal and Disordinal Interaction Effects","authors":"Sang-June Park, Youjae Yi","doi":"10.3102/10769986231217472","DOIUrl":"https://doi.org/10.3102/10769986231217472","url":null,"abstract":"Previous research explicates ordinal and disordinal interactions through the concept of the “crossover point.” This point is determined via simple regression models of a focal predictor at specific moderator values and signifies the intersection of these models. An interaction effect is labeled as disordinal (or ordinal) when the crossover point falls within (or outside) the observable range of the focal predictor. However, this approach might yield erroneous conclusions due to the crossover point’s intrinsic nature as a random variable defined by mean and variance. To statistically evaluate ordinal and disordinal interactions, a comparison between the observable range and the confidence interval (CI) of the crossover point is crucial. Numerous methods for establishing CIs, including reparameterization and bootstrap techniques, exist. Yet, these alternative methods are scarcely employed in social science journals for assessing ordinal and disordinal interactions. This note introduces a straightforward approach for calculating CIs, leveraging an extension of the Johnson–Neyman technique.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"1 5","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138952204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Comparison of Latent Semantic Analysis and Latent Dirichlet Allocation in Educational Measurement 潜在语义分析与潜在德里希勒分配在教育测量中的比较
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-11-27 DOI: 10.3102/10769986231209446
Jordan M. Wheeler, Allan S. Cohen, Shiyu Wang
Topic models are mathematical and statistical models used to analyze textual data. The objective of topic models is to gain information about the latent semantic space of a set of related textual data. The semantic space of a set of textual data contains the relationship between documents and words and how they are used. Topic models are becoming more common in educational measurement research as a method for analyzing students’ responses to constructed-response items. Two popular topic models are latent semantic analysis (LSA) and latent Dirichlet allocation (LDA). LSA uses linear algebra techniques, whereas LDA uses an assumed statistical model and generative process. In educational measurement, LSA is often used in algorithmic scoring of essays due to its high reliability and agreement with human raters. LDA is often used as a supplemental analysis to gain additional information about students, such as their thinking and reasoning. This article reviews and compares the LSA and LDA topic models. This article also introduces a methodology for comparing the semantic spaces obtained by the two models and uses a simulation study to investigate their similarities.
主题模型是用于分析文本数据的数学和统计模型。主题模型的目的是获取一组相关文本数据的潜在语义空间的信息。一组文本数据的语义空间包含文档和单词之间的关系以及它们是如何被使用的。在教育测量研究中,主题模型作为一种分析学生对结构化答题项目的反应的方法,正变得越来越普遍。两种流行的主题模型是潜在语义分析(LSA)和潜在 Dirichlet 分配(LDA)。LSA 使用线性代数技术,而 LDA 则使用假定的统计模型和生成过程。在教育测量中,LSA 因其可靠性高且与人工评分者一致,常用于作文的算法评分。LDA 通常用作补充分析,以获取有关学生的其他信息,如他们的思维和推理能力。本文回顾并比较了 LSA 和 LDA 主题模型。本文还介绍了一种比较两种模型所得到的语义空间的方法,并使用模拟研究来探讨它们的相似性。
{"title":"A Comparison of Latent Semantic Analysis and Latent Dirichlet Allocation in Educational Measurement","authors":"Jordan M. Wheeler, Allan S. Cohen, Shiyu Wang","doi":"10.3102/10769986231209446","DOIUrl":"https://doi.org/10.3102/10769986231209446","url":null,"abstract":"Topic models are mathematical and statistical models used to analyze textual data. The objective of topic models is to gain information about the latent semantic space of a set of related textual data. The semantic space of a set of textual data contains the relationship between documents and words and how they are used. Topic models are becoming more common in educational measurement research as a method for analyzing students’ responses to constructed-response items. Two popular topic models are latent semantic analysis (LSA) and latent Dirichlet allocation (LDA). LSA uses linear algebra techniques, whereas LDA uses an assumed statistical model and generative process. In educational measurement, LSA is often used in algorithmic scoring of essays due to its high reliability and agreement with human raters. LDA is often used as a supplemental analysis to gain additional information about students, such as their thinking and reasoning. This article reviews and compares the LSA and LDA topic models. This article also introduces a methodology for comparing the semantic spaces obtained by the two models and uses a simulation study to investigate their similarities.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"30 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139231033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sample Size Calculation and Optimal Design for Multivariate Regression-Based Norming 基于多元回归规范的样本量计算和优化设计
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-11-22 DOI: 10.3102/10769986231210807
Francesco Innocenti, M. Candel, Frans E. S. Tan, Gerard J. P. van Breukelen
Normative studies are needed to obtain norms for comparing individuals with the reference population on relevant clinical or educational measures. Norms can be obtained in an efficient way by regressing the test score on relevant predictors, such as age and sex. When several measures are normed with the same sample, a multivariate regression-based approach must be adopted for at least two reasons: (1) to take into account the correlations between the measures of the same subject, in order to test certain scientific hypotheses and to reduce misclassification of subjects in clinical practice, and (2) to reduce the number of significance tests involved in selecting predictors for the purpose of norming, thus preventing the inflation of the type I error rate. A new multivariate regression-based approach is proposed that combines all measures for an individual through the Mahalanobis distance, thus providing an indicator of the individual’s overall performance. Furthermore, optimal designs for the normative study are derived under five multivariate polynomial regression models, assuming multivariate normality and homoscedasticity of the residuals, and efficient robust designs are presented in case of uncertainty about the correct model for the analysis of the normative sample. Sample size calculation formulas are provided for the new Mahalanobis distance-based approach. The results are illustrated with data from the Maastricht Aging Study (MAAS).
需要进行常模研究,以获得个人与参照人群在相关临床或教育测量方面的比较常模。通过对相关预测因素(如年龄和性别)对测试得分进行回归,可以有效地获得常模。在对同一样本的多个测量指标进行常模化时,必须采用基于多元回归的方法,原因至少有两个:(1) 考虑同一受试者的测量指标之间的相关性,以检验某些科学假设,并减少临床实践中对受试者的错误分类;(2) 减少为常模化目的而选择预测因子时所涉及的显著性检验次数,从而防止 I 类错误率的膨胀。本文提出了一种基于多元回归的新方法,通过马哈拉诺比斯距离(Mahalanobis distance)将个体的所有测量指标结合起来,从而提供个体整体表现的指标。此外,假定残差的多元正态性和同方差性,在五个多元多项式回归模型下得出了常模研究的最优设计,并在不确定常模样本分析的正确模型的情况下提出了高效稳健设计。为基于 Mahalanobis 距离的新方法提供了样本量计算公式。结果用马斯特里赫特老龄化研究(MAAS)的数据进行了说明。
{"title":"Sample Size Calculation and Optimal Design for Multivariate Regression-Based Norming","authors":"Francesco Innocenti, M. Candel, Frans E. S. Tan, Gerard J. P. van Breukelen","doi":"10.3102/10769986231210807","DOIUrl":"https://doi.org/10.3102/10769986231210807","url":null,"abstract":"Normative studies are needed to obtain norms for comparing individuals with the reference population on relevant clinical or educational measures. Norms can be obtained in an efficient way by regressing the test score on relevant predictors, such as age and sex. When several measures are normed with the same sample, a multivariate regression-based approach must be adopted for at least two reasons: (1) to take into account the correlations between the measures of the same subject, in order to test certain scientific hypotheses and to reduce misclassification of subjects in clinical practice, and (2) to reduce the number of significance tests involved in selecting predictors for the purpose of norming, thus preventing the inflation of the type I error rate. A new multivariate regression-based approach is proposed that combines all measures for an individual through the Mahalanobis distance, thus providing an indicator of the individual’s overall performance. Furthermore, optimal designs for the normative study are derived under five multivariate polynomial regression models, assuming multivariate normality and homoscedasticity of the residuals, and efficient robust designs are presented in case of uncertainty about the correct model for the analysis of the normative sample. Sample size calculation formulas are provided for the new Mahalanobis distance-based approach. The results are illustrated with data from the Maastricht Aging Study (MAAS).","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"106 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139249099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrigendum to Power Approximations for Overall Average Effects in Meta-Analysis With Dependent Effect Sizes 有依赖效应大小的 Meta 分析中总体平均效应的功率近似值》的更正
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-11-17 DOI: 10.3102/10769986231207878
{"title":"Corrigendum to Power Approximations for Overall Average Effects in Meta-Analysis With Dependent Effect Sizes","authors":"","doi":"10.3102/10769986231207878","DOIUrl":"https://doi.org/10.3102/10769986231207878","url":null,"abstract":"","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"144 2","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139266493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Combining Human and Automated Scoring Methods in Experimental Assessments of Writing: A Case Study Tutorial 结合人类和自动评分方法在写作的实验评估:一个案例研究教程
3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-11-08 DOI: 10.3102/10769986231207886
Reagan Mozer, Luke Miratrix, Jackie Eunjung Relyea, James S. Kim
In a randomized trial that collects text as an outcome, traditional approaches for assessing treatment impact require that each document first be manually coded for constructs of interest by human raters. An impact analysis can then be conducted to compare treatment and control groups, using the hand-coded scores as a measured outcome. This process is both time and labor-intensive, which creates a persistent barrier for large-scale assessments of text. Furthermore, enriching one’s understanding of a found impact on text outcomes via secondary analyses can be difficult without additional scoring efforts. The purpose of this article is to provide a pipeline for using machine-based text analytic and data mining tools to augment traditional text-based impact analysis by analyzing impacts across an array of automatically generated text features. In this way, we can explore what an overall impact signifies in terms of how the text has evolved due to treatment. Through a case study based on a recent field trial in education, we show that machine learning can indeed enrich experimental evaluations of text by providing a more comprehensive and fine-grained picture of the mechanisms that lead to stronger argumentative writing in a first- and second-grade content literacy intervention. Relying exclusively on human scoring, by contrast, is a lost opportunity. Overall, the workflow and analytical strategy we describe can serve as a template for researchers interested in performing their own experimental evaluations of text.
在收集文本作为结果的随机试验中,评估治疗影响的传统方法要求每个文档首先由人类评分员手动编码感兴趣的结构。然后可以进行影响分析,比较治疗组和对照组,使用手工编码的分数作为测量结果。这个过程既费时又费力,这给大规模的文本评估造成了持续的障碍。此外,如果没有额外的评分工作,通过二次分析来丰富一个人对文本结果的发现影响的理解可能是困难的。本文的目的是为使用基于机器的文本分析和数据挖掘工具提供一个管道,通过分析一系列自动生成的文本特征的影响来增强传统的基于文本的影响分析。通过这种方式,我们可以探索文本如何因处理而演变的整体影响意味着什么。通过一个基于最近教育领域试验的案例研究,我们表明,机器学习确实可以丰富文本的实验评估,提供更全面、更细致的机制图片,从而在一年级和二年级的内容读写干预中提高议论文写作能力。相比之下,完全依靠人工评分就失去了机会。总的来说,我们描述的工作流程和分析策略可以作为对文本进行自己的实验评估感兴趣的研究人员的模板。
{"title":"Combining Human and Automated Scoring Methods in Experimental Assessments of Writing: A Case Study Tutorial","authors":"Reagan Mozer, Luke Miratrix, Jackie Eunjung Relyea, James S. Kim","doi":"10.3102/10769986231207886","DOIUrl":"https://doi.org/10.3102/10769986231207886","url":null,"abstract":"In a randomized trial that collects text as an outcome, traditional approaches for assessing treatment impact require that each document first be manually coded for constructs of interest by human raters. An impact analysis can then be conducted to compare treatment and control groups, using the hand-coded scores as a measured outcome. This process is both time and labor-intensive, which creates a persistent barrier for large-scale assessments of text. Furthermore, enriching one’s understanding of a found impact on text outcomes via secondary analyses can be difficult without additional scoring efforts. The purpose of this article is to provide a pipeline for using machine-based text analytic and data mining tools to augment traditional text-based impact analysis by analyzing impacts across an array of automatically generated text features. In this way, we can explore what an overall impact signifies in terms of how the text has evolved due to treatment. Through a case study based on a recent field trial in education, we show that machine learning can indeed enrich experimental evaluations of text by providing a more comprehensive and fine-grained picture of the mechanisms that lead to stronger argumentative writing in a first- and second-grade content literacy intervention. Relying exclusively on human scoring, by contrast, is a lost opportunity. Overall, the workflow and analytical strategy we describe can serve as a template for researchers interested in performing their own experimental evaluations of text.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"159 8‐10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135393035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Two-Level Adaptive Test Battery 双电平自适应测试电池
3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-11-06 DOI: 10.3102/10769986231209447
Wim J. van der Linden, Luping Niu, Seung W. Choi
A test battery with two different levels of adaptation is presented: a within-subtest level for the selection of the items in the subtests and a between-subtest level to move from one subtest to the next. The battery runs on a two-level model consisting of a regular response model for each of the subtests extended with a second level for the joint distribution of their abilities. The presentation of the model is followed by an optimized MCMC algorithm to update the posterior distribution of each of its ability parameters, select the items to Bayesian optimality, and adaptively move from one subtest to the next. Thanks to extremely rapid convergence of the Markov chain and simple posterior calculations, the algorithm can be used in real-world applications without any noticeable latency. Finally, an empirical study with a battery of short diagnostic subtests is shown to yield score accuracies close to traditional one-level adaptive testing with subtests of double lengths.
提出了一个具有两种不同适应水平的测试组:子测试内水平用于选择子测试中的项目,子测试间水平用于从一个子测试移动到下一个子测试。电池组在一个两级模型上运行,该模型由每个子测试的规则响应模型组成,并扩展了用于其能力联合分布的第二级模型。在给出模型之后,采用优化的MCMC算法更新模型各能力参数的后验分布,选择贝叶斯最优的项目,并自适应地从一个子测试移动到下一个子测试。由于马尔可夫链的快速收敛和简单的后验计算,该算法可以在实际应用中使用,没有任何明显的延迟。最后,一组短诊断子测试的实证研究表明,其得分准确性接近传统的双长度子测试的单水平自适应测试。
{"title":"A Two-Level Adaptive Test Battery","authors":"Wim J. van der Linden, Luping Niu, Seung W. Choi","doi":"10.3102/10769986231209447","DOIUrl":"https://doi.org/10.3102/10769986231209447","url":null,"abstract":"A test battery with two different levels of adaptation is presented: a within-subtest level for the selection of the items in the subtests and a between-subtest level to move from one subtest to the next. The battery runs on a two-level model consisting of a regular response model for each of the subtests extended with a second level for the joint distribution of their abilities. The presentation of the model is followed by an optimized MCMC algorithm to update the posterior distribution of each of its ability parameters, select the items to Bayesian optimality, and adaptively move from one subtest to the next. Thanks to extremely rapid convergence of the Markov chain and simple posterior calculations, the algorithm can be used in real-world applications without any noticeable latency. Finally, an empirical study with a battery of short diagnostic subtests is shown to yield score accuracies close to traditional one-level adaptive testing with subtests of double lengths.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"43 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135681275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Educational and Behavioral Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1