{"title":"Vine copula based structural equation models","authors":"Claudia Czado","doi":"10.1016/j.csda.2024.108076","DOIUrl":null,"url":null,"abstract":"<div><div>Gaussian linear structural equation models (SEMs) are often used as a statistical model associated with a directed acyclic graph (DAG) also known as a Bayesian network. However, such a model might not be able to represent the non-Gaussian dependence present in some data sets resulting in nonlinear, non-additive and non Gaussian conditional distributions. Therefore the use of the class of D-vine copula based regression models for the specification of the conditional distribution of a node given its parents is proposed. This class extends the class of standard linear regression models considerably. The approach also allows to create an importance order of the parents of each node and gives the potential to remove edges from the starting DAG not supported by the data. Further uncertainty of conditional estimates can be assessed and fast generative simulation using the D-vine copula based SEM is available. The improvement over a Gaussian linear SEM is shown using random specifications of the D-vine based SEM as well as its ability to correctly remove edges not present in the data generation using simulation. An engineering application showcases the usefulness of the proposals.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"203 ","pages":"Article 108076"},"PeriodicalIF":1.5000,"publicationDate":"2024-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Statistics & Data Analysis","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167947324001609","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Gaussian linear structural equation models (SEMs) are often used as a statistical model associated with a directed acyclic graph (DAG) also known as a Bayesian network. However, such a model might not be able to represent the non-Gaussian dependence present in some data sets resulting in nonlinear, non-additive and non Gaussian conditional distributions. Therefore the use of the class of D-vine copula based regression models for the specification of the conditional distribution of a node given its parents is proposed. This class extends the class of standard linear regression models considerably. The approach also allows to create an importance order of the parents of each node and gives the potential to remove edges from the starting DAG not supported by the data. Further uncertainty of conditional estimates can be assessed and fast generative simulation using the D-vine copula based SEM is available. The improvement over a Gaussian linear SEM is shown using random specifications of the D-vine based SEM as well as its ability to correctly remove edges not present in the data generation using simulation. An engineering application showcases the usefulness of the proposals.
高斯线性结构方程模型(SEM)通常被用作与有向无环图(DAG)(也称为贝叶斯网络)相关的统计模型。然而,这种模型可能无法表示某些数据集中存在的非高斯依赖性,从而导致非线性、非相加和非高斯条件分布。因此,我们建议使用基于 D-vine copula 的回归模型来指定一个节点的条件分布(给定其父节点)。这一类模型大大扩展了标准线性回归模型。该方法还允许创建每个节点父节点的重要性顺序,并有可能从起始 DAG 中删除数据不支持的边。此外,还可以评估条件估计值的不确定性,并使用基于 D-vine copula 的 SEM 进行快速生成模拟。与高斯线性 SEM 相比,基于 D-藤的 SEM 使用随机规格显示了其改进之处,并显示了其通过模拟正确移除数据生成中不存在的边的能力。一个工程应用展示了这些建议的实用性。
期刊介绍:
Computational Statistics and Data Analysis (CSDA), an Official Publication of the network Computational and Methodological Statistics (CMStatistics) and of the International Association for Statistical Computing (IASC), is an international journal dedicated to the dissemination of methodological research and applications in the areas of computational statistics and data analysis. The journal consists of four refereed sections which are divided into the following subject areas:
I) Computational Statistics - Manuscripts dealing with: 1) the explicit impact of computers on statistical methodology (e.g., Bayesian computing, bioinformatics,computer graphics, computer intensive inferential methods, data exploration, data mining, expert systems, heuristics, knowledge based systems, machine learning, neural networks, numerical and optimization methods, parallel computing, statistical databases, statistical systems), and 2) the development, evaluation and validation of statistical software and algorithms. Software and algorithms can be submitted with manuscripts and will be stored together with the online article.
II) Statistical Methodology for Data Analysis - Manuscripts dealing with novel and original data analytical strategies and methodologies applied in biostatistics (design and analytic methods for clinical trials, epidemiological studies, statistical genetics, or genetic/environmental interactions), chemometrics, classification, data exploration, density estimation, design of experiments, environmetrics, education, image analysis, marketing, model free data exploration, pattern recognition, psychometrics, statistical physics, image processing, robust procedures.
[...]
III) Special Applications - [...]
IV) Annals of Statistical Data Science [...]