Shi-Fang Qiu, Jie Lei, Wai-Yin Poon, Man-Lai Tang, Ricky S. Wong, Ji-Ran Tao
A sufficient number of participants should be included to adequately address the research interest in the surveys with sensitive questions. In this paper, sample size formulas/iterative algorithms are developed from the perspective of controlling the confidence interval width of the prevalence of a sensitive attribute under four non-randomized response models: the crosswise model, parallel model, Poisson item count technique model and negative binomial item count technique model. In contrast to the conventional approach for sample size determination, our sample size formulas/algorithms explicitly incorporate an assurance probability of controlling the width of a confidence interval within the pre-specified range. The performance of the proposed methods is evaluated with respect to the empirical coverage probability, empirical assurance probability and confidence width. Simulation results show that all formulas/algorithms are effective and hence are recommended for practical applications. A real example is used to illustrate the proposed methods.
{"title":"Sample size determination for interval estimation of the prevalence of a sensitive attribute under non-randomized response models","authors":"Shi-Fang Qiu, Jie Lei, Wai-Yin Poon, Man-Lai Tang, Ricky S. Wong, Ji-Ran Tao","doi":"10.1111/bmsp.12338","DOIUrl":"10.1111/bmsp.12338","url":null,"abstract":"<p>A sufficient number of participants should be included to adequately address the research interest in the surveys with sensitive questions. In this paper, sample size formulas/iterative algorithms are developed from the perspective of controlling the confidence interval width of the prevalence of a sensitive attribute under four non-randomized response models: the crosswise model, parallel model, Poisson item count technique model and negative binomial item count technique model. In contrast to the conventional approach for sample size determination, our sample size formulas/algorithms explicitly incorporate an assurance probability of controlling the width of a confidence interval within the pre-specified range. The performance of the proposed methods is evaluated with respect to the empirical coverage probability, empirical assurance probability and confidence width. Simulation results show that all formulas/algorithms are effective and hence are recommended for practical applications. A real example is used to illustrate the proposed methods.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"77 3","pages":"508-531"},"PeriodicalIF":1.8,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139974734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sebastian Castro-Alvarez, Sandip Sinharay, Laura F. Bringmann, Rob R. Meijer, Jorge N. Tendeiro
Several new models based on item response theory have recently been suggested to analyse intensive longitudinal data. One of these new models is the time-varying dynamic partial credit model (TV-DPCM; Castro-Alvarez et al., Multivariate Behavioral Research, 2023, 1), which is a combination of the partial credit model and the time-varying autoregressive model. The model allows the study of the psychometric properties of the items and the modelling of nonlinear trends at the latent state level. However, there is a severe lack of tools to assess the fit of the TV-DPCM. In this paper, we propose and develop several test statistics and discrepancy measures based on the posterior predictive model checking (PPMC) method (PPMC; Rubin, The Annals of Statistics, 1984, 12, 1151) to assess the fit of the TV-DPCM. Simulated and empirical data are used to study the performance of and illustrate the effectiveness of the PPMC method.
最近,有人提出了几种基于项目反应理论的新模型来分析密集的纵向数据。其中一个新模型是时变动态部分学分模型(TV-DPCM;Castro-Alvarez 等人,《多变量行为研究》,2023 年第 1 期),它是部分学分模型和时变自回归模型的结合。该模型可以研究项目的心理测量特性,并在潜态水平上建立非线性趋势模型。然而,目前严重缺乏评估 TV-DPCM 拟合度的工具。在本文中,我们基于后验预测模型检查(PPMC)方法(PPMC; Rubin, The Annals of Statistics, 1984, 12, 1151)提出并开发了几种测试统计量和差异测量方法,用于评估 TV-DPCM 的拟合度。模拟数据和经验数据用于研究 PPMC 方法的性能并说明其有效性。
{"title":"Assessment of fit of the time-varying dynamic partial credit model using the posterior predictive model checking method","authors":"Sebastian Castro-Alvarez, Sandip Sinharay, Laura F. Bringmann, Rob R. Meijer, Jorge N. Tendeiro","doi":"10.1111/bmsp.12339","DOIUrl":"10.1111/bmsp.12339","url":null,"abstract":"<p>Several new models based on item response theory have recently been suggested to analyse intensive longitudinal data. One of these new models is the time-varying dynamic partial credit model (TV-DPCM; Castro-Alvarez et al., <i>Multivariate Behavioral Research</i>, 2023, 1), which is a combination of the partial credit model and the time-varying autoregressive model. The model allows the study of the psychometric properties of the items and the modelling of nonlinear trends at the latent state level. However, there is a severe lack of tools to assess the fit of the TV-DPCM. In this paper, we propose and develop several test statistics and discrepancy measures based on the posterior predictive model checking (PPMC) method (PPMC; Rubin, <i>The Annals of Statistics</i>, 1984, 12, 1151) to assess the fit of the TV-DPCM. Simulated and empirical data are used to study the performance of and illustrate the effectiveness of the PPMC method.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"77 3","pages":"532-552"},"PeriodicalIF":1.8,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139914100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Exploratory structural equation modelling (ESEM) is an alternative to the well-known method of confirmatory factor analysis (CFA). ESEM is mainly used to assess the quality of measurement models of common factors but can be efficiently extended to test structural models. However, ESEM may not be the best option in some model specifications, especially when structural models are involved, because the full flexibility of ESEM could result in technical difficulties in model estimation. Thus, set-ESEM was developed to accommodate the balance between full-ESEM and CFA. In the present paper, we show examples where set-ESEM should be used rather than full-ESEM. Rather than relying on a simulation study, we provide two applied examples using real data that are included in the OSF repository. Additionally, we provide the code needed to run set-ESEM in the free R package lavaan to make the paper practical. Set-ESEM structural models outperform their CFA-based counterparts in terms of goodness of fit and realistic factor correlation, and hence path coefficients in the two empirical examples. In several instances, effects that were non-significant (i.e., attenuated) in the CFA-based structural model become larger and significant in the set-ESEM structural model, suggesting that set-ESEM models may generate more accurate model parameters and, hence, lower Type II error rate.
探索性结构方程模型(ESEM)是著名的确证因素分析(CFA)方法的替代方法。ESEM 主要用于评估常见因子测量模型的质量,但也可以有效地扩展到测试结构模型。然而,ESEM 在某些模型规格中可能不是最佳选择,尤其是涉及结构模型时,因为 ESEM 的充分灵活性可能会导致模型估计中的技术困难。因此,为了兼顾完全 ESEM 和 CFA,我们开发了集合 ESEM。在本文中,我们将举例说明在哪些情况下应使用集合-ESEM,而不是完全-ESEM。我们没有依赖模拟研究,而是使用 OSF 存储库中的真实数据提供了两个应用实例。此外,我们还在免费的 R 软件包 lavaan 中提供了运行 Set-ESEM 所需的代码,从而使本文更加实用。在拟合优度和现实因子相关性方面,集合-ESEM 结构模型优于基于 CFA 的结构模型,因此在两个实证例子中的路径系数也优于基于 CFA 的结构模型。有几次,在基于 CFA 的结构模型中不显著(即衰减)的效应在集合-ESEM 结构模型中变得更大和显著,这表明集合-ESEM 模型可能会生成更准确的模型参数,从而降低 II 类错误率。
{"title":"When and how to use set-exploratory structural equation modelling to test structural models: A tutorial using the R package lavaan","authors":"Herb Marsh, Abdullah Alamer","doi":"10.1111/bmsp.12336","DOIUrl":"10.1111/bmsp.12336","url":null,"abstract":"<p>Exploratory structural equation modelling (ESEM) is an alternative to the well-known method of confirmatory factor analysis (CFA). ESEM is mainly used to assess the quality of measurement models of common factors but can be efficiently extended to test structural models. However, ESEM may not be the best option in some model specifications, especially when structural models are involved, because the full flexibility of ESEM could result in technical difficulties in model estimation. Thus, set-ESEM was developed to accommodate the balance between full-ESEM and CFA. In the present paper, we show examples where set-ESEM should be used rather than full-ESEM. Rather than relying on a simulation study, we provide two applied examples using real data that are included in the OSF repository. Additionally, we provide the code needed to run set-ESEM in the free R package <i>lavaan</i> to make the paper practical. Set-ESEM structural models outperform their CFA-based counterparts in terms of goodness of fit and realistic factor correlation, and hence path coefficients in the two empirical examples. In several instances, effects that were non-significant (i.e., attenuated) in the CFA-based structural model become larger and significant in the set-ESEM structural model, suggesting that set-ESEM models may generate more accurate model parameters and, hence, lower Type II error rate.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"77 3","pages":"459-476"},"PeriodicalIF":1.8,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/bmsp.12336","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139742778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Different data types often occur in psychological and educational measurement such as computer-based assessments that record performance and process data (e.g., response times and the number of actions). Modelling such data requires specific models for each data type and accommodating complex dependencies between multiple variables. Generalized linear latent variable models are suitable for modelling mixed data simultaneously, but estimation can be computationally demanding. A fast solution is to use Laplace approximations, but existing implementations of joint modelling of mixed data types are limited to ordinal and continuous data. To address this limitation, we derive an efficient estimation method that uses first- or second-order Laplace approximations to simultaneously model ordinal data, continuous data, and count data. We illustrate the approach with an example and conduct simulations to evaluate the performance of the method in terms of estimation efficiency, convergence, and parameter recovery. The results suggest that the second-order Laplace approximation achieves a higher convergence rate and produces accurate yet fast parameter estimates compared to the first-order Laplace approximation, while the time cost increases with higher model complexity. Additionally, models that consider the dependence of variables from the same stimulus fit the empirical data substantially better than models that disregarded the dependence.
{"title":"Fast estimation of generalized linear latent variable models for performance and process data with ordinal, continuous, and count observed variables","authors":"Maoxin Zhang, Björn Andersson, Shaobo Jin","doi":"10.1111/bmsp.12337","DOIUrl":"10.1111/bmsp.12337","url":null,"abstract":"<p>Different data types often occur in psychological and educational measurement such as computer-based assessments that record performance and process data (e.g., response times and the number of actions). Modelling such data requires specific models for each data type and accommodating complex dependencies between multiple variables. Generalized linear latent variable models are suitable for modelling mixed data simultaneously, but estimation can be computationally demanding. A fast solution is to use Laplace approximations, but existing implementations of joint modelling of mixed data types are limited to ordinal and continuous data. To address this limitation, we derive an efficient estimation method that uses first- or second-order Laplace approximations to simultaneously model ordinal data, continuous data, and count data. We illustrate the approach with an example and conduct simulations to evaluate the performance of the method in terms of estimation efficiency, convergence, and parameter recovery. The results suggest that the second-order Laplace approximation achieves a higher convergence rate and produces accurate yet fast parameter estimates compared to the first-order Laplace approximation, while the time cost increases with higher model complexity. Additionally, models that consider the dependence of variables from the same stimulus fit the empirical data substantially better than models that disregarded the dependence.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"77 3","pages":"477-507"},"PeriodicalIF":1.8,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/bmsp.12337","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139725087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pasquale Anselmi, Jürgen Heller, Luca Stefanutti, Egidio Robusto
Competence-based test development is a recent and innovative method for the construction of tests that are as informative as possible about the competence state (the set of skills an individual has available) underlying the observed item responses. It finds application in different contexts, including the development of tests from scratch, and the improvement or shortening of existing tests. Given a fixed collection of competence states existing in a population of individuals and a fixed collection of competencies (each of which being the subset of skills that allow for solving an item), the competency deletion procedure results in tests that differ from each other in the competencies but are all equally informative about individuals' competence states. This work introduces a streamlined version of the competency deletion procedure that considers information necessary for test construction only, illustrates a straightforward way to incorporate test developer preferences about competencies into the test construction process, and evaluates the performance of the resulting tests in uncovering the competence states from the observed item responses.
{"title":"Constructing tests for skill assessment with competence-based test development","authors":"Pasquale Anselmi, Jürgen Heller, Luca Stefanutti, Egidio Robusto","doi":"10.1111/bmsp.12335","DOIUrl":"10.1111/bmsp.12335","url":null,"abstract":"<p>Competence-based test development is a recent and innovative method for the construction of tests that are as informative as possible about the competence state (the set of skills an individual has available) underlying the observed item responses. It finds application in different contexts, including the development of tests from scratch, and the improvement or shortening of existing tests. Given a fixed collection of competence states existing in a population of individuals and a fixed collection of competencies (each of which being the subset of skills that allow for solving an item), the competency deletion procedure results in tests that differ from each other in the competencies but are all equally informative about individuals' competence states. This work introduces a streamlined version of the competency deletion procedure that considers information necessary for test construction only, illustrates a straightforward way to incorporate test developer preferences about competencies into the test construction process, and evaluates the performance of the resulting tests in uncovering the competence states from the observed item responses.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"77 3","pages":"429-458"},"PeriodicalIF":1.8,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139661345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Crossed random effects models (CREMs) are particularly useful in longitudinal data applications because they allow researchers to account for the impact of dynamic group membership on individual outcomes. However, no research has determined what data conditions need to be met to sufficiently identify these models, especially the group effects, in a longitudinal context. This is a significant gap in the current literature as future applications to real data may need to consider these conditions to yield accurate and precise model parameter estimates, specifically for the group effects on individual outcomes. Furthermore, there are no existing CREMs that can model intrinsically nonlinear growth. The goals of this study are to develop a Bayesian piecewise CREM to model intrinsically nonlinear growth and evaluate what data conditions are necessary to empirically identify both intrinsically linear and nonlinear longitudinal CREMs. This study includes an applied example that utilizes the piecewise CREM with real data and three simulation studies to assess the data conditions necessary to estimate linear, quadratic, and piecewise CREMs. Results show that the number of repeated measurements collected on groups impacts the ability to recover the group effects. Additionally, functional form complexity impacted data collection requirements for estimating longitudinal CREMs.
{"title":"Identifiability and estimability of Bayesian linear and nonlinear crossed random effects models","authors":"Corissa T. Rohloff, Nidhi Kohli, Eric F. Lock","doi":"10.1111/bmsp.12334","DOIUrl":"10.1111/bmsp.12334","url":null,"abstract":"<p>Crossed random effects models (CREMs) are particularly useful in longitudinal data applications because they allow researchers to account for the impact of dynamic group membership on individual outcomes. However, no research has determined what data conditions need to be met to sufficiently identify these models, especially the group effects, in a longitudinal context. This is a significant gap in the current literature as future applications to real data may need to consider these conditions to yield accurate and precise model parameter estimates, specifically for the group effects on individual outcomes. Furthermore, there are no existing CREMs that can model intrinsically nonlinear growth. The goals of this study are to develop a Bayesian piecewise CREM to model intrinsically nonlinear growth and evaluate what data conditions are necessary to empirically identify both intrinsically linear and nonlinear longitudinal CREMs. This study includes an applied example that utilizes the piecewise CREM with real data and three simulation studies to assess the data conditions necessary to estimate linear, quadratic, and piecewise CREMs. Results show that the number of repeated measurements collected on groups impacts the ability to recover the group effects. Additionally, functional form complexity impacted data collection requirements for estimating longitudinal CREMs.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"77 2","pages":"375-394"},"PeriodicalIF":1.8,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/bmsp.12334","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139543995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Agreement studies often involve more than two raters or repeated measurements. In the presence of two raters, the proportion of agreement and of positive agreement are simple and popular agreement measures for binary scales. These measures were generalized to agreement studies involving more than two raters with statistical inference procedures proposed on an empirical basis. We present two alternatives. The first is a Wald confidence interval using standard errors obtained by the delta method. The second involves Bayesian statistical inference not requiring any specific Bayesian software. These new procedures show better statistical behaviour than the confidence intervals initially proposed. In addition, we provide analytical formulas to determine the minimum number of persons needed for a given number of raters when planning an agreement study. All methods are implemented in the R package simpleagree and the Shiny app simpleagree.
一致性研究通常涉及两个以上的评分者或重复测量。在有两个评分者的情况下,二元量表的一致比例和积极一致比例是简单而常用的一致度量。根据经验提出的统计推论程序,这些测量方法被推广到涉及两个以上评分者的一致性研究中。我们提出了两种替代方案。第一种是使用德尔塔法获得的标准误差的沃尔德置信区间。第二种涉及贝叶斯统计推断,不需要任何特定的贝叶斯软件。与最初提出的置信区间相比,这些新程序显示出更好的统计性能。此外,我们还提供了分析公式,以便在计划协议研究时确定给定数量的评分者所需的最少人数。所有方法都在 R 软件包 simpleagree 和 Shiny 应用程序 simpleagree 中实现。
{"title":"Statistical inference for agreement between multiple raters on a binary scale","authors":"Sophie Vanbelle","doi":"10.1111/bmsp.12333","DOIUrl":"10.1111/bmsp.12333","url":null,"abstract":"<p>Agreement studies often involve more than two raters or repeated measurements. In the presence of two raters, the proportion of agreement and of positive agreement are simple and popular agreement measures for binary scales. These measures were generalized to agreement studies involving more than two raters with statistical inference procedures proposed on an empirical basis. We present two alternatives. The first is a Wald confidence interval using standard errors obtained by the delta method. The second involves Bayesian statistical inference not requiring any specific Bayesian software. These new procedures show better statistical behaviour than the confidence intervals initially proposed. In addition, we provide analytical formulas to determine the minimum number of persons needed for a given number of raters when planning an agreement study. All methods are implemented in the R package <i>simpleagree</i> and the Shiny app <i>simpleagree</i>.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"77 2","pages":"245-260"},"PeriodicalIF":1.8,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/bmsp.12333","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139486878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rodrigo Macías, J. Fernando Vera, Willem J. Heiser
Clustering and spatial representation methods are often used in combination, to analyse preference ratings when a large number of individuals and/or object is involved. When analysed under an unfolding model, row-conditional linear transformations are usually most appropriate when the goal is to determine clusters of individuals with similar preferences. However, a significant problem with transformations that include both slope and intercept is the occurrence of degenerate solutions. In this paper, we propose a least squares unfolding method that performs clustering of individuals while simultaneously estimating the location of cluster centres and object locations in low-dimensional space. The method is based on minimising the mean squared centred residuals of the preference ratings with respect to the distances between cluster centres and object locations. At the same time, the distances are row-conditionally transformed with optimally estimated slope parameters. It is computationally efficient for large datasets, and does not suffer from the appearance of degenerate solutions. The performance of the method is analysed in an extensive Monte Carlo experiment. It is illustrated for a real data set and the results are compared with those obtained using a two-step clustering and unfolding procedure.
{"title":"A cluster differences unfolding method for large datasets of preference ratings on an interval scale: Minimizing the mean squared centred residuals","authors":"Rodrigo Macías, J. Fernando Vera, Willem J. Heiser","doi":"10.1111/bmsp.12332","DOIUrl":"10.1111/bmsp.12332","url":null,"abstract":"<p>Clustering and spatial representation methods are often used in combination, to analyse preference ratings when a large number of individuals and/or object is involved. When analysed under an unfolding model, row-conditional linear transformations are usually most appropriate when the goal is to determine clusters of individuals with similar preferences. However, a significant problem with transformations that include both slope and intercept is the occurrence of degenerate solutions. In this paper, we propose a least squares unfolding method that performs clustering of individuals while simultaneously estimating the location of cluster centres and object locations in low-dimensional space. The method is based on minimising the mean squared centred residuals of the preference ratings with respect to the distances between cluster centres and object locations. At the same time, the distances are row-conditionally transformed with optimally estimated slope parameters. It is computationally efficient for large datasets, and does not suffer from the appearance of degenerate solutions. The performance of the method is analysed in an extensive Monte Carlo experiment. It is illustrated for a real data set and the results are compared with those obtained using a two-step clustering and unfolding procedure.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"77 2","pages":"356-374"},"PeriodicalIF":1.8,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139426139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study mainly concerns correction for measurement error using the meta-analysis of Fisher's z-transformed correlations. The disattenuation formula of Spearman (American Journal of Psychology, 15, 1904, 72) is used to correct for individual raw correlations in primary studies. The corrected raw correlations are then used to obtain the corrected z-transformed correlations. What remains little studied, however, is how to best correct for within-study sampling error variances of corrected z-transformed correlations. We focused on three within-study sampling error variance estimators corrected for measurement error that were proposed in earlier studies and is proposed in the current study: (1) the formula given by Hedges (Test validity, Lawrence Erlbaum, 1988) assuming a linear relationship between corrected and uncorrected z-transformed correlations (linear correction), (2) one derived by the first-order delta method based on the average of corrected z-transformed correlations (stabilized first-order correction), and (3) one derived by the second-order delta method based on the average of corrected z-transformed correlations (stabilized second-order correction). Via a simulation study, we compared performance of these estimators and the sampling error variance estimator uncorrected for measurement error in terms of estimation and inference accuracy of the mean correlation as well as the homogeneity test of effect sizes. In obtaining the corrected z-transformed correlations and within-study sampling error variances, coefficient alpha was used as a common reliability coefficient estimate. The results showed that in terms of the estimated mean correlation, sampling error variances with linear correction, the stabilized first-order and second-order corrections, and no correction performed similarly in general. Furthermore, in terms of the homogeneity test, given a relatively large average sample size and normal true scores, the stabilized first-order and second-order corrections had type I error rates that were generally controlled as well as or better than the other estimators. Overall, stabilized first-order and second-order corrections are recommended when true scores are normal, reliabilities are acceptable, the number of items per psychological scale is relatively large, and the average sample size is relatively large.
本研究主要涉及利用费舍尔 z 变形相关系数的元分析来校正测量误差。斯皮尔曼(Spearman)的失调公式(《美国心理学杂志》,15,1904,72)用于校正原始研究中的个体原始相关性。然后使用校正后的原始相关性来获得校正后的 z 转换相关性。然而,对于如何最好地校正校正过的 z 转换相关性的研究内部抽样误差方差的研究仍然很少。我们重点研究了三种校正了测量误差的研究内部抽样误差方差估计方法,这三种方法在以前的研究中提出过,在本次研究中也提出了:(1) Hedges(《测试有效性》,Lawrence Erlbaum,1988 年)给出的公式,假设校正过的和未校正过的 z 转换相关系数之间存在线性关系(线性校正);(2) 根据校正过的 z 转换相关系数的平均值,通过一阶三角法得出的公式(稳定一阶校正);(3) 根据校正过的 z 转换相关系数的平均值,通过二阶三角法得出的公式(稳定二阶校正)。通过模拟研究,我们比较了这些估计器和未修正测量误差的抽样误差方差估计器在平均相关性的估计和推断准确性以及效应大小的同质性检验方面的性能。在获得经校正的 z 变形相关性和研究内部抽样误差方差时,使用了系数 α 作为通用的可靠性系数估计值。结果表明,在估计平均相关性方面,采用线性校正、稳定的一阶和二阶校正以及不采用校正的抽样误差方差总体表现相似。此外,就同质性检验而言,在平均样本量相对较大且真实分数正常的情况下,稳定化一阶和二阶校正的 I 类错误率通常控制得与其他估计器一样好,甚至更好。总体而言,当真实得分正常、信度可接受、每个心理量表的项目数相对较多、平均样本量相对较大时,建议使用稳定一阶和二阶修正法。
{"title":"Correcting for measurement error under meta-analysis of z-transformed correlations","authors":"Qian Zhang, Qi Wang","doi":"10.1111/bmsp.12328","DOIUrl":"10.1111/bmsp.12328","url":null,"abstract":"<p>This study mainly concerns correction for measurement error using the meta-analysis of Fisher's z-transformed correlations. The disattenuation formula of Spearman (American Journal of Psychology, <b>15</b>, 1904, 72) is used to correct for individual raw correlations in primary studies. The corrected raw correlations are then used to obtain the corrected z-transformed correlations. What remains little studied, however, is how to best correct for within-study sampling error variances of corrected z-transformed correlations. We focused on three within-study sampling error variance estimators corrected for measurement error that were proposed in earlier studies and is proposed in the current study: (1) the formula given by Hedges (<i>Test validity</i>, Lawrence Erlbaum, 1988) assuming a linear relationship between corrected and uncorrected z-transformed correlations (linear correction), (2) one derived by the first-order delta method based on the average of corrected z-transformed correlations (stabilized first-order correction), and (3) one derived by the second-order delta method based on the average of corrected z-transformed correlations (stabilized second-order correction). Via a simulation study, we compared performance of these estimators and the sampling error variance estimator uncorrected for measurement error in terms of estimation and inference accuracy of the mean correlation as well as the homogeneity test of effect sizes. In obtaining the corrected z-transformed correlations and within-study sampling error variances, coefficient alpha was used as a common reliability coefficient estimate. The results showed that in terms of the estimated mean correlation, sampling error variances with linear correction, the stabilized first-order and second-order corrections, and no correction performed similarly in general. Furthermore, in terms of the homogeneity test, given a relatively large average sample size and normal true scores, the stabilized first-order and second-order corrections had type I error rates that were generally controlled as well as or better than the other estimators. Overall, stabilized first-order and second-order corrections are recommended when true scores are normal, reliabilities are acceptable, the number of items per psychological scale is relatively large, and the average sample size is relatively large.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"77 2","pages":"261-288"},"PeriodicalIF":1.8,"publicationDate":"2023-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139059109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wan-Lun Wang, Luis M. Castro, Huei-Jyun Li, Tsung-I Lin
Analysing data from educational tests allows governments to make decisions for improving the quality of life of individuals in a society. One of the key responsibilities of statisticians is to develop models that provide decision-makers with pertinent information about the latent process that educational tests seek to represent. Mixtures of