Intensive longitudinal data are often found to be non-stationary, namely, showing changes in statistical properties, such as means and variance-covariance structures, over time. One way to accommodate non-stationarity is to specify key parameters that show over-time changes as time-varying parameters (TVPs). However, the nature and dynamics of TVPs may themselves be heterogeneous across time, contexts, developmental stages, individuals and as related to other biopsychosocial-cultural influences. We propose an outlier detection method designed to facilitate the detection of critical shifts in any differentiable linear and non-linear dynamic functions, including dynamic functions for TVPs. This approach can be readily applied to various data scenarios, including single-subject and multisubject, univariate and multivariate processes, as well as with and without latent variables. We demonstrate the utility and performance of this approach with three sets of simulation studies and an empirical illustration using facial electromyography data from a laboratory emotion induction study.
{"title":"Detecting Critical Change in Dynamics Through Outlier Detection with Time-Varying Parameters.","authors":"Meng Chen, Michael D Hunter, Sy-Miin Chow","doi":"10.1111/bmsp.70010","DOIUrl":"https://doi.org/10.1111/bmsp.70010","url":null,"abstract":"<p><p>Intensive longitudinal data are often found to be non-stationary, namely, showing changes in statistical properties, such as means and variance-covariance structures, over time. One way to accommodate non-stationarity is to specify key parameters that show over-time changes as time-varying parameters (TVPs). However, the nature and dynamics of TVPs may themselves be heterogeneous across time, contexts, developmental stages, individuals and as related to other biopsychosocial-cultural influences. We propose an outlier detection method designed to facilitate the detection of critical shifts in any differentiable linear and non-linear dynamic functions, including dynamic functions for TVPs. This approach can be readily applied to various data scenarios, including single-subject and multisubject, univariate and multivariate processes, as well as with and without latent variables. We demonstrate the utility and performance of this approach with three sets of simulation studies and an empirical illustration using facial electromyography data from a laboratory emotion induction study.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145066348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Psychological research has traditionally relied on linear models to test scientific hypotheses. However, the emergence of machine learning (ML) algorithms has opened new opportunities for exploring variable relationships beyond linear constraints. To interpret the outcomes of these 'black-box' algorithms, various tools for assessing feature importance have been developed. However, most of these tools are descriptive and do not facilitate statistical inference. To address this gap, our study introduces two versions of residual permutation tests (RPTs), designed to assess the significance of a target feature in predicting the label. The first variant, RPT on Y (RPT-Y), permutes the residuals of the label conditioned on features other than the target. The second variant, RPT on X (RPT-X), permutes the residuals of the target feature conditioned on the other features. Through a comprehensive simulation study, we show that RPT-X maintains empirical Type I error rates under the nominal level across a wide range of ML algorithms and demonstrates appropriate statistical power in both regression and classification contexts. These findings suggest the utility of RPT-X for hypothesis testing in ML applications.
心理学研究传统上依靠线性模型来检验科学假设。然而,机器学习(ML)算法的出现为探索超越线性约束的变量关系开辟了新的机会。为了解释这些“黑盒”算法的结果,已经开发了各种评估特征重要性的工具。然而,这些工具大多是描述性的,不便于统计推断。为了解决这一差距,我们的研究引入了两个版本的残差排列测试(RPTs),旨在评估目标特征在预测标签中的重要性。第一种变体,RPT on Y (RPT-Y),根据目标以外的特征来排列标签的残差。第二个变体,RPT on X (RPT-X),将目标特征的残差以其他特征为条件进行排列。通过全面的模拟研究,我们表明RPT-X在广泛的ML算法中保持经验I型错误率低于标称水平,并在回归和分类上下文中显示出适当的统计能力。这些发现表明RPT-X在机器学习应用中的假设检验的效用。
{"title":"Residual permutation tests for feature importance in machine learning.","authors":"Po-Hsien Huang","doi":"10.1111/bmsp.70009","DOIUrl":"https://doi.org/10.1111/bmsp.70009","url":null,"abstract":"<p><p>Psychological research has traditionally relied on linear models to test scientific hypotheses. However, the emergence of machine learning (ML) algorithms has opened new opportunities for exploring variable relationships beyond linear constraints. To interpret the outcomes of these 'black-box' algorithms, various tools for assessing feature importance have been developed. However, most of these tools are descriptive and do not facilitate statistical inference. To address this gap, our study introduces two versions of residual permutation tests (RPTs), designed to assess the significance of a target feature in predicting the label. The first variant, RPT on Y (RPT-Y), permutes the residuals of the label conditioned on features other than the target. The second variant, RPT on X (RPT-X), permutes the residuals of the target feature conditioned on the other features. Through a comprehensive simulation study, we show that RPT-X maintains empirical Type I error rates under the nominal level across a wide range of ML algorithms and demonstrates appropriate statistical power in both regression and classification contexts. These findings suggest the utility of RPT-X for hypothesis testing in ML applications.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144979556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cornelis J. Potgieter, Akihito Kamata, Yusuf Kara, Xin Qiao
In this study, we explore parameter estimation for a joint count-time data model with a two-factor latent trait structure, representing accuracy and speed. Each count-time variable pair corresponds to a specific item on a measurement instrument, where each item consists of a fixed number of tasks. The count variable represents the number of successfully completed tasks and is modeled using a Beta-binomial distribution to account for potential over-dispersion. The time variable, representing the duration needed to complete the tasks, is modeled using a normal distribution on a logarithmic scale. To characterize the model structure, we derive marginal moments that inform a set of method-of-moments (MOM) estimators, which serve as initial values for maximum likelihood estimation (MLE) via the Monte Carlo Expectation-Maximization (MCEM) algorithm. Standard errors are estimated using both the observed information matrix and bootstrap resampling, with simulation results indicating superior performance of the bootstrap, especially near boundary values of the dispersion parameter. A comprehensive simulation study investigates estimator accuracy and computational efficiency. To demonstrate the methodology, we analyze oral reading fluency (ORF) data, showing substantial variation in item-level dispersion and providing evidence for the improved model fit of the Beta-binomial specification, assessed using standardized root mean square residuals (SRMSR).
{"title":"Joint analysis of dispersed count-time data using a bivariate latent factor model","authors":"Cornelis J. Potgieter, Akihito Kamata, Yusuf Kara, Xin Qiao","doi":"10.1111/bmsp.70005","DOIUrl":"10.1111/bmsp.70005","url":null,"abstract":"<p>In this study, we explore parameter estimation for a joint count-time data model with a two-factor latent trait structure, representing accuracy and speed. Each count-time variable pair corresponds to a specific item on a measurement instrument, where each item consists of a fixed number of tasks. The count variable represents the number of successfully completed tasks and is modeled using a Beta-binomial distribution to account for potential over-dispersion. The time variable, representing the duration needed to complete the tasks, is modeled using a normal distribution on a logarithmic scale. To characterize the model structure, we derive marginal moments that inform a set of method-of-moments (MOM) estimators, which serve as initial values for maximum likelihood estimation (MLE) via the Monte Carlo Expectation-Maximization (MCEM) algorithm. Standard errors are estimated using both the observed information matrix and bootstrap resampling, with simulation results indicating superior performance of the bootstrap, especially near boundary values of the dispersion parameter. A comprehensive simulation study investigates estimator accuracy and computational efficiency. To demonstrate the methodology, we analyze oral reading fluency (ORF) data, showing substantial variation in item-level dispersion and providing evidence for the improved model fit of the Beta-binomial specification, assessed using standardized root mean square residuals (SRMSR).</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"79 1","pages":"207-228"},"PeriodicalIF":1.8,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://bpspsychub.onlinelibrary.wiley.com/doi/epdf/10.1111/bmsp.70005","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144979632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose the generalized mixed reduced rank regression method, GMR3 for short. GMR3 is a regression method for a mix of numeric, binary and ordinal response variables. The predictor variables can be a mix of binary, nominal, ordinal and numeric variables. For dealing with the categorical predictors we use optimal scaling. A majorization-minimization algorithm is derived for maximum likelihood estimation. A series of simulation studies is shown (Section 4) to evaluate the performance of the algorithm with different types of predictor and response variables. In Section 5, we briefly discuss the choices to make when applying the model the empirical data and give suggestions for supporting such choices. In a second simulation study (Section 6), we further study the behaviour of the model and algorithm in different scenarios for the true rank in relation to sample size. In Section 7, we show an application of GMR3 using the Eurobarometer Surveys data set of 2023.
{"title":"Reduced rank regression for mixed predictor and response variables","authors":"Mark de Rooij, Lorenza Cotugno, Roberta Siciliano","doi":"10.1111/bmsp.70004","DOIUrl":"10.1111/bmsp.70004","url":null,"abstract":"<p>In this paper, we propose the generalized mixed reduced rank regression method, GMR<sup>3</sup> for short. GMR<sup>3</sup> is a regression method for a mix of numeric, binary and ordinal response variables. The predictor variables can be a mix of binary, nominal, ordinal and numeric variables. For dealing with the categorical predictors we use optimal scaling. A majorization-minimization algorithm is derived for maximum likelihood estimation. A series of simulation studies is shown (Section 4) to evaluate the performance of the algorithm with different types of predictor and response variables. In Section 5, we briefly discuss the choices to make when applying the model the empirical data and give suggestions for supporting such choices. In a second simulation study (Section 6), we further study the behaviour of the model and algorithm in different scenarios for the true rank in relation to sample size. In Section 7, we show an application of GMR<sup>3</sup> using the Eurobarometer Surveys data set of 2023.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"79 1","pages":"173-206"},"PeriodicalIF":1.8,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://bpspsychub.onlinelibrary.wiley.com/doi/epdf/10.1111/bmsp.70004","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144979605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The use of exponential random graph models (ERGMs) is becoming prevalent in psychology due to their ability to explain and predict the formation of edges between vertices in a network. Valid inference with ERGMs requires correctly specifying endogenous and exogenous effects as network statistics, guided by theory, to represent the network-generating process while ensuring key effects shaping network topology are not omitted. However, specifying a comprehensive model is challenging, particularly when relying on a single model. Despite this, most applied research continues to use a single ERGM, raising two concerns: Selecting misspecified models compromises valid statistical inference, and single-model inference ignores uncertainty in model selection. One approach to addressing these issues is Bayesian model averaging (BMA), which evaluates multiple candidate models, accounts for uncertainty in parameter estimation and model selection, and is more robust to model misspecification than single-model inference. This tutorial provides a guide to implementing BMA for ERGMs. We illustrate its application using data from a college friendship network, with a supplementary example based on the Florentine marriage network; both focus on averaging exogenous covariate effects. We demonstrate how BMA incorporates theoretical considerations and addresses modelling challenges in ERGMs, with annotated R code provided for replication and extension.
{"title":"A tutorial on Bayesian model averaging for exponential random graph models.","authors":"Ihnwhi Heo, Jan-Willem Simons, Haiyan Liu","doi":"10.1111/bmsp.70007","DOIUrl":"https://doi.org/10.1111/bmsp.70007","url":null,"abstract":"<p><p>The use of exponential random graph models (ERGMs) is becoming prevalent in psychology due to their ability to explain and predict the formation of edges between vertices in a network. Valid inference with ERGMs requires correctly specifying endogenous and exogenous effects as network statistics, guided by theory, to represent the network-generating process while ensuring key effects shaping network topology are not omitted. However, specifying a comprehensive model is challenging, particularly when relying on a single model. Despite this, most applied research continues to use a single ERGM, raising two concerns: Selecting misspecified models compromises valid statistical inference, and single-model inference ignores uncertainty in model selection. One approach to addressing these issues is Bayesian model averaging (BMA), which evaluates multiple candidate models, accounts for uncertainty in parameter estimation and model selection, and is more robust to model misspecification than single-model inference. This tutorial provides a guide to implementing BMA for ERGMs. We illustrate its application using data from a college friendship network, with a supplementary example based on the Florentine marriage network; both focus on averaging exogenous covariate effects. We demonstrate how BMA incorporates theoretical considerations and addresses modelling challenges in ERGMs, with annotated R code provided for replication and extension.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144876847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We provide a review and commentary on recent methodological research related to item response theory (IRT) modelling of response styles in psychological measurement. Our review describes the different categories of IRT models that have been proposed, their associated assumptions and extensions, and the varying purposes they can serve. Our review also seeks to highlight some of the fundamental challenges shared across models in the study and statistical control of response style behaviour. We conclude with some thoughts regarding future directions, including the potential uses of response style models for sensitivity analysis and informed survey design and administration.
{"title":"IRT-based response style models and related methodology: Review and commentary.","authors":"Daniel M Bolt, Lionel Meng","doi":"10.1111/bmsp.70006","DOIUrl":"https://doi.org/10.1111/bmsp.70006","url":null,"abstract":"<p><p>We provide a review and commentary on recent methodological research related to item response theory (IRT) modelling of response styles in psychological measurement. Our review describes the different categories of IRT models that have been proposed, their associated assumptions and extensions, and the varying purposes they can serve. Our review also seeks to highlight some of the fundamental challenges shared across models in the study and statistical control of response style behaviour. We conclude with some thoughts regarding future directions, including the potential uses of response style models for sensitivity analysis and informed survey design and administration.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144876848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Structural equation modeling (SEM) is often seen as a complex and difficult method, especially for those who want to understand how the numbers in SEM software output are actually computed. Although many open-source SEM tools are now available-especially in the R programming environment-looking into their source code to understand the underlying calculations can still be overwhelming. This tutorial aims to provide a clear and accessible introduction to the basic computations behind standard SEM analyses. Using two well-known example datasets, we show how to manually reproduce key results such as parameter estimates, standard errors, and fit measures using simple R scripts. The focus is on clarity and understanding rather than speed or efficiency. We hope that by following this tutorial, readers will gain a better grasp of how SEM works "under the hood," and be able to apply similar ideas in their own research.
{"title":"A tutorial for understanding SEM using R: Where do all the numbers come from?","authors":"Yves Rosseel, Marc Vidal","doi":"10.1111/bmsp.70003","DOIUrl":"https://doi.org/10.1111/bmsp.70003","url":null,"abstract":"<p><p>Structural equation modeling (SEM) is often seen as a complex and difficult method, especially for those who want to understand how the numbers in SEM software output are actually computed. Although many open-source SEM tools are now available-especially in the R programming environment-looking into their source code to understand the underlying calculations can still be overwhelming. This tutorial aims to provide a clear and accessible introduction to the basic computations behind standard SEM analyses. Using two well-known example datasets, we show how to manually reproduce key results such as parameter estimates, standard errors, and fit measures using simple R scripts. The focus is on clarity and understanding rather than speed or efficiency. We hope that by following this tutorial, readers will gain a better grasp of how SEM works \"under the hood,\" and be able to apply similar ideas in their own research.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144627726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Effect size estimates are now widely reported in various behavioural studies. In precise estimation or power analysis studies, sample size planning revolves around the standard error (or variance) of the effect size. Note these studies are carried out under sampling-budget constraints. Hence, the optimum allocation of resources to populations with different inherent population variances is paramount as this affects the effect size variance. In this paper, a general effect size meant to compare two population characteristics is defined, and under budget constraints, we aim to optimize the variance of the general effect size. In the process, we use sequential theory to arrive at optimum sample sizes of the corresponding populations to achieve minimum variance. The sequential method we developed is a distribution-free method and does not need knowledge of population parameters. Mathematical justification of the characteristics enjoyed by our sequential method is laid out along with simulation studies. Thus, our work has wide applicability in the effect size comparison context.
{"title":"Effect size comparison for populations with an application in psychology","authors":"Bhargab Chattopadhyay, Sudeep R. Bapat","doi":"10.1111/bmsp.70001","DOIUrl":"10.1111/bmsp.70001","url":null,"abstract":"<p>Effect size estimates are now widely reported in various behavioural studies. In precise estimation or power analysis studies, sample size planning revolves around the standard error (or variance) of the effect size. Note these studies are carried out under sampling-budget constraints. Hence, the optimum allocation of resources to populations with different inherent population variances is paramount as this affects the effect size variance. In this paper, a general effect size meant to compare two population characteristics is defined, and under budget constraints, we aim to optimize the variance of the general effect size. In the process, we use sequential theory to arrive at optimum sample sizes of the corresponding populations to achieve minimum variance. The sequential method we developed is a distribution-free method and does not need knowledge of population parameters. Mathematical justification of the characteristics enjoyed by our sequential method is laid out along with simulation studies. Thus, our work has wide applicability in the effect size comparison context.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"79 1","pages":"146-172"},"PeriodicalIF":1.8,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144477967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Test scores, like the sum score, can be useful for making inferences about the latent variables. The conditions under which such test scores allow for inferences of the latent variables based on a “weaker” stochastic ordering are generalized to any monotone latent variable model for which the latent variables are associated. The generality of these conditions places the sum score, or indeed any test score, well beyond a mere intuitive measure or a relic from classical test theory.
{"title":"Inferences of associated latent variables by the observable test scores","authors":"Rudy Ligtvoet","doi":"10.1111/bmsp.70002","DOIUrl":"10.1111/bmsp.70002","url":null,"abstract":"<p>Test scores, like the sum score, can be useful for making inferences about the latent variables. The conditions under which such test scores allow for inferences of the latent variables based on a “weaker” stochastic ordering are generalized to any monotone latent variable model for which the latent variables are associated. The generality of these conditions places the sum score, or indeed any test score, well beyond a mere intuitive measure or a relic from classical test theory.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"79 1","pages":"139-145"},"PeriodicalIF":1.8,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://bpspsychub.onlinelibrary.wiley.com/doi/epdf/10.1111/bmsp.70002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144327806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Instrumental variable (IV) estimation constitutes a powerful quasi-experimental tool to estimate causal effects in observational data. The IV approach, however, rests on two crucial assumptions—the instrument relevance assumption and the exclusion restriction assumption. The latter requirement (stating that the IV is not allowed to be related to the outcome via any path other than the one going through the predictor), cannot be empirically tested in just-identified models (i.e. models with as many IVs as predictors). The present study introduces properties of non-Gaussian IV models which enable one to test whether hidden confounding between an IV and the outcome is present. Detecting exclusion restriction violations due to a direct path between the IV and the outcome, however, is restricted to the over-identified case. Based on these insights, a two-step approach is presented to test IV validity against hidden confounding in just-identified models. The performance of the approach was evaluated using Monte-Carlo simulation experiments. An empirical example from psychological research is given to illustrate the approach in practice. Recommendations for best-practice applications and future research directions are discussed. Although the current study presents important insights for developing diagnostic procedures for IV models, sound universal IV validation in the just-identified case remains a challenging task.
{"title":"Testing the validity of instrumental variables in just-identified linear non-Gaussian models","authors":"Wolfgang Wiedermann, Dexin Shi","doi":"10.1111/bmsp.70000","DOIUrl":"10.1111/bmsp.70000","url":null,"abstract":"<p>Instrumental variable (IV) estimation constitutes a powerful quasi-experimental tool to estimate causal effects in observational data. The IV approach, however, rests on two crucial assumptions—the instrument relevance assumption and the exclusion restriction assumption. The latter requirement (stating that the IV is not allowed to be related to the outcome via any path other than the one going through the predictor), cannot be empirically tested in just-identified models (i.e. models with as many IVs as predictors). The present study introduces properties of non-Gaussian IV models which enable one to test whether hidden confounding between an IV and the outcome is present. Detecting exclusion restriction violations due to a direct path between the IV and the outcome, however, is restricted to the over-identified case. Based on these insights, a two-step approach is presented to test IV validity against hidden confounding in just-identified models. The performance of the approach was evaluated using Monte-Carlo simulation experiments. An empirical example from psychological research is given to illustrate the approach in practice. Recommendations for best-practice applications and future research directions are discussed. Although the current study presents important insights for developing diagnostic procedures for IV models, sound universal IV validation in the just-identified case remains a challenging task.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"79 1","pages":"111-138"},"PeriodicalIF":1.8,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144310869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}