{"title":"Partial Homoscedasticity in Causal Discovery With Linear Models","authors":"Jun Wu;Mathias Drton","doi":"10.1109/JSAIT.2023.3328476","DOIUrl":null,"url":null,"abstract":"Recursive linear structural equation models and the associated directed acyclic graphs (DAGs) play an important role in causal discovery. The classic identifiability result for this class of models states that when only observational data is available, each DAG can be identified only up to a Markov equivalence class. In contrast, recent work has shown that the DAG can be uniquely identified if the errors in the model are homoscedastic, i.e., all have the same variance. This equal variance assumption yields methods that, if appropriate, are highly scalable and also sheds light on fundamental information-theoretic limits and optimality in causal discovery. In this paper, we fill the gap that exists between the two previously considered cases, which assume the error variances to be either arbitrary or all equal. Specifically, we formulate a framework of partial homoscedasticity, in which the variables are partitioned into blocks and each block shares the same error variance. For any such groupwise equal variances assumption, we characterize when two DAGs give rise to identical Gaussian linear structural equation models. Furthermore, we show how the resulting distributional equivalence classes may be represented using a completed partially directed acyclic graph (CPDAG), and we give an algorithm to efficiently construct this CPDAG. In a simulation study, we demonstrate that greedy search provides an effective way to learn the CPDAG and exploit partial knowledge about homoscedasticity of errors in structural equation models.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"4 ","pages":"639-650"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10304270","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE journal on selected areas in information theory","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10304270/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Recursive linear structural equation models and the associated directed acyclic graphs (DAGs) play an important role in causal discovery. The classic identifiability result for this class of models states that when only observational data is available, each DAG can be identified only up to a Markov equivalence class. In contrast, recent work has shown that the DAG can be uniquely identified if the errors in the model are homoscedastic, i.e., all have the same variance. This equal variance assumption yields methods that, if appropriate, are highly scalable and also sheds light on fundamental information-theoretic limits and optimality in causal discovery. In this paper, we fill the gap that exists between the two previously considered cases, which assume the error variances to be either arbitrary or all equal. Specifically, we formulate a framework of partial homoscedasticity, in which the variables are partitioned into blocks and each block shares the same error variance. For any such groupwise equal variances assumption, we characterize when two DAGs give rise to identical Gaussian linear structural equation models. Furthermore, we show how the resulting distributional equivalence classes may be represented using a completed partially directed acyclic graph (CPDAG), and we give an algorithm to efficiently construct this CPDAG. In a simulation study, we demonstrate that greedy search provides an effective way to learn the CPDAG and exploit partial knowledge about homoscedasticity of errors in structural equation models.
递归线性结构方程模型和相关的有向无环图(DAG)在因果发现中发挥着重要作用。这类模型的经典可识别性结果表明,当只有观测数据时,每个 DAG 只能识别到马尔可夫等价类。相反,最近的研究表明,如果模型中的误差是同方差的,即所有误差都具有相同的方差,那么 DAG 就可以唯一地识别出来。这种等方差假设产生的方法,如果合适,具有很强的可扩展性,同时也揭示了因果发现中基本的信息论极限和最优性。在本文中,我们填补了之前考虑的两种情况之间的空白,即假设误差方差要么任意,要么全部相等。具体来说,我们提出了一个部分同方差的框架,在这个框架中,变量被划分成块,每个块共享相同的误差方差。对于任何这样的分组等方差假设,我们都会描述两个 DAG 何时会产生相同的高斯线性结构方程模型。此外,我们还展示了如何用一个完整的部分有向无环图(CPDAG)来表示所产生的分布等价类,并给出了一种高效构建 CPDAG 的算法。在模拟研究中,我们证明了贪婪搜索是学习 CPDAG 和利用结构方程模型中误差同方差性部分知识的有效方法。