Pub Date : 2024-01-07DOI: 10.3102/10769986231218306
Daniel Koretz
A critically important balance in educational measurement between practical concerns and matters of technique has atrophied in recent decades, and as a result, some important issues in the field have not been adequately addressed. I start with the work of E. F. Lindquist, who exemplified the balance that is now wanting. Lindquist was arguably the most prolific developer of achievement tests in the history of the field and an accomplished statistician, but he nonetheless focused extensively on the practical limitations of testing and their implications for test development, test use, and inference. I describe the withering of this balance and discuss two pressing issues that have not been adequately addressed as a result: the lack of robustness of performance standards and score inflation. I conclude by discussing steps toward reestablishing the needed balance.
近几十年来,教育测量学在实际问题和技术问题之间失去了至关重要的平衡,因此,该领域的一些重要问题没有得到充分解决。我先从 E. F. Lindquist 的工作谈起,他是现在缺乏平衡的典范。林奎斯特可以说是该领域历史上最多产的成绩测验开发者,也是一位杰出的统计学家,但他仍然广泛关注测验的实际局限性及其对测验开发、测验使用和推断的影响。我描述了这种平衡的凋零,并讨论了因此而没有得到充分解决的两个紧迫问题:成绩标准缺乏稳健性和分数膨胀。最后,我将讨论重建必要平衡的步骤。
{"title":"Improving Balance in Educational Measurement: A Legacy of E. F. Lindquist","authors":"Daniel Koretz","doi":"10.3102/10769986231218306","DOIUrl":"https://doi.org/10.3102/10769986231218306","url":null,"abstract":"A critically important balance in educational measurement between practical concerns and matters of technique has atrophied in recent decades, and as a result, some important issues in the field have not been adequately addressed. I start with the work of E. F. Lindquist, who exemplified the balance that is now wanting. Lindquist was arguably the most prolific developer of achievement tests in the history of the field and an accomplished statistician, but he nonetheless focused extensively on the practical limitations of testing and their implications for test development, test use, and inference. I describe the withering of this balance and discuss two pressing issues that have not been adequately addressed as a result: the lack of robustness of performance standards and score inflation. I conclude by discussing steps toward reestablishing the needed balance.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"68 7","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139449121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-21DOI: 10.3102/10769986231217472
Sang-June Park, Youjae Yi
Previous research explicates ordinal and disordinal interactions through the concept of the “crossover point.” This point is determined via simple regression models of a focal predictor at specific moderator values and signifies the intersection of these models. An interaction effect is labeled as disordinal (or ordinal) when the crossover point falls within (or outside) the observable range of the focal predictor. However, this approach might yield erroneous conclusions due to the crossover point’s intrinsic nature as a random variable defined by mean and variance. To statistically evaluate ordinal and disordinal interactions, a comparison between the observable range and the confidence interval (CI) of the crossover point is crucial. Numerous methods for establishing CIs, including reparameterization and bootstrap techniques, exist. Yet, these alternative methods are scarcely employed in social science journals for assessing ordinal and disordinal interactions. This note introduces a straightforward approach for calculating CIs, leveraging an extension of the Johnson–Neyman technique.
以往的研究通过 "交叉点 "的概念来解释顺序和非顺序的相互作用。交叉点是通过在特定调节因子值下的焦点预测因子的简单回归模型确定的,它标志着这些模型的交叉点。当交叉点位于焦点预测因子的可观测范围之内(或之外)时,交互作用效应就会被标记为不和谐(或顺序)。然而,由于交叉点是由均值和方差定义的随机变量,因此这种方法可能会得出错误的结论。要对顺序和非顺序交互作用进行统计评估,交叉点的可观测范围和置信区间(CI)之间的比较至关重要。建立置信区间的方法有很多,包括重参数化和引导技术。然而,在社会科学期刊中,这些替代方法很少被用于评估顺序和非顺序交互作用。本说明介绍了一种利用约翰逊-奈曼技术扩展计算 CI 的直接方法。
{"title":"A Simple Technique Assessing Ordinal and Disordinal Interaction Effects","authors":"Sang-June Park, Youjae Yi","doi":"10.3102/10769986231217472","DOIUrl":"https://doi.org/10.3102/10769986231217472","url":null,"abstract":"Previous research explicates ordinal and disordinal interactions through the concept of the “crossover point.” This point is determined via simple regression models of a focal predictor at specific moderator values and signifies the intersection of these models. An interaction effect is labeled as disordinal (or ordinal) when the crossover point falls within (or outside) the observable range of the focal predictor. However, this approach might yield erroneous conclusions due to the crossover point’s intrinsic nature as a random variable defined by mean and variance. To statistically evaluate ordinal and disordinal interactions, a comparison between the observable range and the confidence interval (CI) of the crossover point is crucial. Numerous methods for establishing CIs, including reparameterization and bootstrap techniques, exist. Yet, these alternative methods are scarcely employed in social science journals for assessing ordinal and disordinal interactions. This note introduces a straightforward approach for calculating CIs, leveraging an extension of the Johnson–Neyman technique.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"1 5","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138952204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-27DOI: 10.3102/10769986231209446
Jordan M. Wheeler, Allan S. Cohen, Shiyu Wang
Topic models are mathematical and statistical models used to analyze textual data. The objective of topic models is to gain information about the latent semantic space of a set of related textual data. The semantic space of a set of textual data contains the relationship between documents and words and how they are used. Topic models are becoming more common in educational measurement research as a method for analyzing students’ responses to constructed-response items. Two popular topic models are latent semantic analysis (LSA) and latent Dirichlet allocation (LDA). LSA uses linear algebra techniques, whereas LDA uses an assumed statistical model and generative process. In educational measurement, LSA is often used in algorithmic scoring of essays due to its high reliability and agreement with human raters. LDA is often used as a supplemental analysis to gain additional information about students, such as their thinking and reasoning. This article reviews and compares the LSA and LDA topic models. This article also introduces a methodology for comparing the semantic spaces obtained by the two models and uses a simulation study to investigate their similarities.
{"title":"A Comparison of Latent Semantic Analysis and Latent Dirichlet Allocation in Educational Measurement","authors":"Jordan M. Wheeler, Allan S. Cohen, Shiyu Wang","doi":"10.3102/10769986231209446","DOIUrl":"https://doi.org/10.3102/10769986231209446","url":null,"abstract":"Topic models are mathematical and statistical models used to analyze textual data. The objective of topic models is to gain information about the latent semantic space of a set of related textual data. The semantic space of a set of textual data contains the relationship between documents and words and how they are used. Topic models are becoming more common in educational measurement research as a method for analyzing students’ responses to constructed-response items. Two popular topic models are latent semantic analysis (LSA) and latent Dirichlet allocation (LDA). LSA uses linear algebra techniques, whereas LDA uses an assumed statistical model and generative process. In educational measurement, LSA is often used in algorithmic scoring of essays due to its high reliability and agreement with human raters. LDA is often used as a supplemental analysis to gain additional information about students, such as their thinking and reasoning. This article reviews and compares the LSA and LDA topic models. This article also introduces a methodology for comparing the semantic spaces obtained by the two models and uses a simulation study to investigate their similarities.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"30 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139231033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-22DOI: 10.3102/10769986231210807
Francesco Innocenti, M. Candel, Frans E. S. Tan, Gerard J. P. van Breukelen
Normative studies are needed to obtain norms for comparing individuals with the reference population on relevant clinical or educational measures. Norms can be obtained in an efficient way by regressing the test score on relevant predictors, such as age and sex. When several measures are normed with the same sample, a multivariate regression-based approach must be adopted for at least two reasons: (1) to take into account the correlations between the measures of the same subject, in order to test certain scientific hypotheses and to reduce misclassification of subjects in clinical practice, and (2) to reduce the number of significance tests involved in selecting predictors for the purpose of norming, thus preventing the inflation of the type I error rate. A new multivariate regression-based approach is proposed that combines all measures for an individual through the Mahalanobis distance, thus providing an indicator of the individual’s overall performance. Furthermore, optimal designs for the normative study are derived under five multivariate polynomial regression models, assuming multivariate normality and homoscedasticity of the residuals, and efficient robust designs are presented in case of uncertainty about the correct model for the analysis of the normative sample. Sample size calculation formulas are provided for the new Mahalanobis distance-based approach. The results are illustrated with data from the Maastricht Aging Study (MAAS).
需要进行常模研究,以获得个人与参照人群在相关临床或教育测量方面的比较常模。通过对相关预测因素(如年龄和性别)对测试得分进行回归,可以有效地获得常模。在对同一样本的多个测量指标进行常模化时,必须采用基于多元回归的方法,原因至少有两个:(1) 考虑同一受试者的测量指标之间的相关性,以检验某些科学假设,并减少临床实践中对受试者的错误分类;(2) 减少为常模化目的而选择预测因子时所涉及的显著性检验次数,从而防止 I 类错误率的膨胀。本文提出了一种基于多元回归的新方法,通过马哈拉诺比斯距离(Mahalanobis distance)将个体的所有测量指标结合起来,从而提供个体整体表现的指标。此外,假定残差的多元正态性和同方差性,在五个多元多项式回归模型下得出了常模研究的最优设计,并在不确定常模样本分析的正确模型的情况下提出了高效稳健设计。为基于 Mahalanobis 距离的新方法提供了样本量计算公式。结果用马斯特里赫特老龄化研究(MAAS)的数据进行了说明。
{"title":"Sample Size Calculation and Optimal Design for Multivariate Regression-Based Norming","authors":"Francesco Innocenti, M. Candel, Frans E. S. Tan, Gerard J. P. van Breukelen","doi":"10.3102/10769986231210807","DOIUrl":"https://doi.org/10.3102/10769986231210807","url":null,"abstract":"Normative studies are needed to obtain norms for comparing individuals with the reference population on relevant clinical or educational measures. Norms can be obtained in an efficient way by regressing the test score on relevant predictors, such as age and sex. When several measures are normed with the same sample, a multivariate regression-based approach must be adopted for at least two reasons: (1) to take into account the correlations between the measures of the same subject, in order to test certain scientific hypotheses and to reduce misclassification of subjects in clinical practice, and (2) to reduce the number of significance tests involved in selecting predictors for the purpose of norming, thus preventing the inflation of the type I error rate. A new multivariate regression-based approach is proposed that combines all measures for an individual through the Mahalanobis distance, thus providing an indicator of the individual’s overall performance. Furthermore, optimal designs for the normative study are derived under five multivariate polynomial regression models, assuming multivariate normality and homoscedasticity of the residuals, and efficient robust designs are presented in case of uncertainty about the correct model for the analysis of the normative sample. Sample size calculation formulas are provided for the new Mahalanobis distance-based approach. The results are illustrated with data from the Maastricht Aging Study (MAAS).","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"106 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139249099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-17DOI: 10.3102/10769986231207878
{"title":"Corrigendum to Power Approximations for Overall Average Effects in Meta-Analysis With Dependent Effect Sizes","authors":"","doi":"10.3102/10769986231207878","DOIUrl":"https://doi.org/10.3102/10769986231207878","url":null,"abstract":"","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"144 2","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139266493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-08DOI: 10.3102/10769986231207886
Reagan Mozer, Luke Miratrix, Jackie Eunjung Relyea, James S. Kim
In a randomized trial that collects text as an outcome, traditional approaches for assessing treatment impact require that each document first be manually coded for constructs of interest by human raters. An impact analysis can then be conducted to compare treatment and control groups, using the hand-coded scores as a measured outcome. This process is both time and labor-intensive, which creates a persistent barrier for large-scale assessments of text. Furthermore, enriching one’s understanding of a found impact on text outcomes via secondary analyses can be difficult without additional scoring efforts. The purpose of this article is to provide a pipeline for using machine-based text analytic and data mining tools to augment traditional text-based impact analysis by analyzing impacts across an array of automatically generated text features. In this way, we can explore what an overall impact signifies in terms of how the text has evolved due to treatment. Through a case study based on a recent field trial in education, we show that machine learning can indeed enrich experimental evaluations of text by providing a more comprehensive and fine-grained picture of the mechanisms that lead to stronger argumentative writing in a first- and second-grade content literacy intervention. Relying exclusively on human scoring, by contrast, is a lost opportunity. Overall, the workflow and analytical strategy we describe can serve as a template for researchers interested in performing their own experimental evaluations of text.
{"title":"Combining Human and Automated Scoring Methods in Experimental Assessments of Writing: A Case Study Tutorial","authors":"Reagan Mozer, Luke Miratrix, Jackie Eunjung Relyea, James S. Kim","doi":"10.3102/10769986231207886","DOIUrl":"https://doi.org/10.3102/10769986231207886","url":null,"abstract":"In a randomized trial that collects text as an outcome, traditional approaches for assessing treatment impact require that each document first be manually coded for constructs of interest by human raters. An impact analysis can then be conducted to compare treatment and control groups, using the hand-coded scores as a measured outcome. This process is both time and labor-intensive, which creates a persistent barrier for large-scale assessments of text. Furthermore, enriching one’s understanding of a found impact on text outcomes via secondary analyses can be difficult without additional scoring efforts. The purpose of this article is to provide a pipeline for using machine-based text analytic and data mining tools to augment traditional text-based impact analysis by analyzing impacts across an array of automatically generated text features. In this way, we can explore what an overall impact signifies in terms of how the text has evolved due to treatment. Through a case study based on a recent field trial in education, we show that machine learning can indeed enrich experimental evaluations of text by providing a more comprehensive and fine-grained picture of the mechanisms that lead to stronger argumentative writing in a first- and second-grade content literacy intervention. Relying exclusively on human scoring, by contrast, is a lost opportunity. Overall, the workflow and analytical strategy we describe can serve as a template for researchers interested in performing their own experimental evaluations of text.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"159 8‐10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135393035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-06DOI: 10.3102/10769986231209447
Wim J. van der Linden, Luping Niu, Seung W. Choi
A test battery with two different levels of adaptation is presented: a within-subtest level for the selection of the items in the subtests and a between-subtest level to move from one subtest to the next. The battery runs on a two-level model consisting of a regular response model for each of the subtests extended with a second level for the joint distribution of their abilities. The presentation of the model is followed by an optimized MCMC algorithm to update the posterior distribution of each of its ability parameters, select the items to Bayesian optimality, and adaptively move from one subtest to the next. Thanks to extremely rapid convergence of the Markov chain and simple posterior calculations, the algorithm can be used in real-world applications without any noticeable latency. Finally, an empirical study with a battery of short diagnostic subtests is shown to yield score accuracies close to traditional one-level adaptive testing with subtests of double lengths.
{"title":"A Two-Level Adaptive Test Battery","authors":"Wim J. van der Linden, Luping Niu, Seung W. Choi","doi":"10.3102/10769986231209447","DOIUrl":"https://doi.org/10.3102/10769986231209447","url":null,"abstract":"A test battery with two different levels of adaptation is presented: a within-subtest level for the selection of the items in the subtests and a between-subtest level to move from one subtest to the next. The battery runs on a two-level model consisting of a regular response model for each of the subtests extended with a second level for the joint distribution of their abilities. The presentation of the model is followed by an optimized MCMC algorithm to update the posterior distribution of each of its ability parameters, select the items to Bayesian optimality, and adaptively move from one subtest to the next. Thanks to extremely rapid convergence of the Markov chain and simple posterior calculations, the algorithm can be used in real-world applications without any noticeable latency. Finally, an empirical study with a battery of short diagnostic subtests is shown to yield score accuracies close to traditional one-level adaptive testing with subtests of double lengths.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"43 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135681275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-06DOI: 10.3102/10769986231207879
Joakim Wallmark, James O. Ramsay, Juan Li, Marie Wiberg
Item response theory (IRT) models the relationship between the possible scores on a test item against a test taker’s attainment of the latent trait that the item is intended to measure. In this study, we compare two models for tests with polytomously scored items: the optimal scoring (OS) model, a nonparametric IRT model based on the principles of information theory, and the generalized partial credit (GPC) model, a widely used parametric alternative. We evaluate these models using both simulated and real test data. In the real data examples, the OS model demonstrates superior model fit compared to the GPC model across all analyzed datasets. In our simulation study, the OS model outperforms the GPC model in terms of bias, but at the cost of larger standard errors for the probabilities along the estimated item response functions. Furthermore, we illustrate how surprisal arc length, an IRT scale invariant measure of ability with metric properties, can be used to put scores from vastly different types of IRT models on a common scale. We also demonstrate how arc length can be a viable alternative to sum scores for scoring test takers.
{"title":"Analyzing Polytomous Test Data: A Comparison Between an Information-Based IRT Model and the Generalized Partial Credit Model","authors":"Joakim Wallmark, James O. Ramsay, Juan Li, Marie Wiberg","doi":"10.3102/10769986231207879","DOIUrl":"https://doi.org/10.3102/10769986231207879","url":null,"abstract":"Item response theory (IRT) models the relationship between the possible scores on a test item against a test taker’s attainment of the latent trait that the item is intended to measure. In this study, we compare two models for tests with polytomously scored items: the optimal scoring (OS) model, a nonparametric IRT model based on the principles of information theory, and the generalized partial credit (GPC) model, a widely used parametric alternative. We evaluate these models using both simulated and real test data. In the real data examples, the OS model demonstrates superior model fit compared to the GPC model across all analyzed datasets. In our simulation study, the OS model outperforms the GPC model in terms of bias, but at the cost of larger standard errors for the probabilities along the estimated item response functions. Furthermore, we illustrate how surprisal arc length, an IRT scale invariant measure of ability with metric properties, can be used to put scores from vastly different types of IRT models on a common scale. We also demonstrate how arc length can be a viable alternative to sum scores for scoring test takers.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"43 19","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135681661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-26DOI: 10.3102/10769986231210002
Steven Andrew Culpepper, Gongjun Xu
The COVID-19 pandemic forced millions of students to transition from traditional in-person instruction into a learning environment that incorporates facets of social distancing and online education (National Center for Education Statistics, 2022). One consequence is that the massive disruption of the COVID-19 health crisis is related to the largest declines in elementary and secondary students’ educational achievement as inferred from recent results of the National Assessment of Educational Progress long-term trend (U.S. Department of Education, 2022). Accordingly, recent events have raised awareness of the need for robust formative assessments to accelerate learning and improve educational and behavioral outcomes. The
{"title":"Introduction to <i>JEBS</i> Special Issue on Diagnostic Statistical Models","authors":"Steven Andrew Culpepper, Gongjun Xu","doi":"10.3102/10769986231210002","DOIUrl":"https://doi.org/10.3102/10769986231210002","url":null,"abstract":"The COVID-19 pandemic forced millions of students to transition from traditional in-person instruction into a learning environment that incorporates facets of social distancing and online education (National Center for Education Statistics, 2022). One consequence is that the massive disruption of the COVID-19 health crisis is related to the largest declines in elementary and secondary students’ educational achievement as inferred from recent results of the National Assessment of Educational Progress long-term trend (U.S. Department of Education, 2022). Accordingly, recent events have raised awareness of the need for robust formative assessments to accelerate learning and improve educational and behavioral outcomes. The","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"41 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134908866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-13DOI: 10.3102/10769986231200155
Mark L. Davison, Hao Jia, Ernest C. Davenport
Researchers examine contrasts between analysis of variance (ANOVA) effects but seldom contrasts between regression coefficients even though such coefficients are an ANOVA generalization. Regression weight contrasts can be analyzed by reparameterizing the linear model. Two pairwise contrast models are developed for the study of qualitative differences among predictors. One leads to tests of null hypotheses that the regression weight for a reference predictor equals each of the other weights. The second involves ordered predictors and null hypotheses that the weight for a predictor equals that for the variables just above or below in the ordering. As illustration, qualitative differences in high school math course content are related to math achievement. The models facilitate the study of qualitative differences among predictors and the allocation of resources. They also readily generalize to moderated, hierarchical, and generalized linear forms.
{"title":"Pairwise Regression Weight Contrasts: Models for Allocating Psychological Resources","authors":"Mark L. Davison, Hao Jia, Ernest C. Davenport","doi":"10.3102/10769986231200155","DOIUrl":"https://doi.org/10.3102/10769986231200155","url":null,"abstract":"Researchers examine contrasts between analysis of variance (ANOVA) effects but seldom contrasts between regression coefficients even though such coefficients are an ANOVA generalization. Regression weight contrasts can be analyzed by reparameterizing the linear model. Two pairwise contrast models are developed for the study of qualitative differences among predictors. One leads to tests of null hypotheses that the regression weight for a reference predictor equals each of the other weights. The second involves ordered predictors and null hypotheses that the weight for a predictor equals that for the variables just above or below in the ordering. As illustration, qualitative differences in high school math course content are related to math achievement. The models facilitate the study of qualitative differences among predictors and the allocation of resources. They also readily generalize to moderated, hierarchical, and generalized linear forms.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135859026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}