Targets of inference that establish causality are phrased in terms of counterfactual responses to interventions. These potential outcomes operationalize cause effect relationships by means of comparisons of cases and controls in hypothetical randomized controlled experiments. In many applied settings, data on such experiments is not directly available, necessitating assumptions linking the counterfactual target of inference with the factual observed data distribution. This link is provided by causal models. Originally defined on potential outcomes directly (Rubin, 1976), causal models have been extended to longitudinal settings (Robins, 1986), and reformulated as graphical models (Spirtes et al., 2001; Pearl, 2009). In settings where common causes of all observed variables are themselves observed, many causal inference targets are identified via variations of the expression referred to in the literature as the g-formula (Robins, 1986), the manipulated distribution (Spirtes et al., 2001), or the truncated factorization (Pearl, 2009). In settings where hidden variables are present, identification results become considerably more complicated. In this manuscript, we review identification theory in causal models with hidden variables for common targets that arise in causal inference applications, including causal effects, direct, indirect, and path-specific effects, and outcomes of dynamic treatment regimes. We will describe a simple formulation of this theory (Tian and Pearl, 2002; Shpitser and Pearl, 2006b,a; Tian, 2008; Shpitser, 2013) in terms of causal graphical models, and the fixing operator, a statistical analogue of the intervention operation (Richardson et al., 2017).
建立因果关系的推理目标是根据对干预的反事实反应来表述的。这些潜在结果通过在假设的随机对照实验中对病例和对照进行比较来实现因果关系的操作性。在许多应用环境中,此类实验的数据不能直接获得,因此需要将反事实的推理目标与事实观察到的数据分布联系起来。这种联系是由因果模型提供的。最初直接定义潜在结果(Rubin, 1976),因果模型已经扩展到纵向设置(Robins, 1986),并重新制定为图形模型(Spirtes等人,2001;珍珠,2009)。在观察到所有被观察变量的共同原因的情况下,通过文献中提到的g公式(Robins, 1986)、操纵分布(Spirtes et al., 2001)或截断因子分解(Pearl, 2009)等表达式的变化来确定许多因果推理目标。在存在隐藏变量的设置中,识别结果变得相当复杂。在本文中,我们回顾了在因果推理应用中出现的具有隐藏变量的因果模型中的识别理论,包括因果效应、直接效应、间接效应和路径特异性效应,以及动态治疗方案的结果。我们将描述这一理论的一个简单公式(Tian and Pearl, 2002;Shpitser and Pearl, 2006b,a;田,2008;Shpitser, 2013)在因果图模型方面,以及固定算子,干预操作的统计模拟(Richardson等,2017)。
{"title":"Identification in Causal Models With Hidden Variables.","authors":"Ilya Shpitser","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Targets of inference that establish causality are phrased in terms of counterfactual responses to interventions. These <i>potential outcomes</i> operationalize cause effect relationships by means of comparisons of cases and controls in hypothetical randomized controlled experiments. In many applied settings, data on such experiments is not directly available, necessitating assumptions linking the counterfactual target of inference with the factual observed data distribution. This link is provided by causal models. Originally defined on potential outcomes directly (Rubin, 1976), causal models have been extended to longitudinal settings (Robins, 1986), and reformulated as graphical models (Spirtes et al., 2001; Pearl, 2009). In settings where common causes of all observed variables are themselves observed, many causal inference targets are identified via variations of the expression referred to in the literature as the <i>g-formula</i> (Robins, 1986), the <i>manipulated distribution</i> (Spirtes et al., 2001), or the <i>truncated factorization</i> (Pearl, 2009). In settings where hidden variables are present, identification results become considerably more complicated. In this manuscript, we review identification theory in causal models with hidden variables for common targets that arise in causal inference applications, including causal effects, direct, indirect, and path-specific effects, and outcomes of dynamic treatment regimes. We will describe a simple formulation of this theory (Tian and Pearl, 2002; Shpitser and Pearl, 2006b,a; Tian, 2008; Shpitser, 2013) in terms of causal graphical models, and the fixing operator, a statistical analogue of the intervention operation (Richardson et al., 2017).</p>","PeriodicalId":44492,"journal":{"name":"Journal of the SFdS","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7685307/pdf/nihms-1063757.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38654029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Preventive vaccines are an effective public health intervention for reducing the burden of infectious diseases, but have yet to be developed for several major infectious diseases. Vaccine sieve analysis studies whether and how the efficacy of a vaccine varies with the genetics of the infectious pathogen, which may help guide future vaccine development and deployment. A standard statistical approach to sieve analysis compares the effect of the vaccine to prevent infection and disease caused by pathogen types defined dichotomously as genetically near or far from a reference pathogen strain inside the vaccine construct. For example, near may be defined by amino acid identity at all amino acid positions considered in a multiple alignment and far defined by at least one amino acid difference. An alternative approach is to study the efficacy of the vaccine as a function of genetic distance from a pathogen to a reference vaccine strain where the distance cumulates over the set of amino acid positions. We propose a nonparametric method for estimating and testing the trend in the effect of a vaccine across genetic distance. We illustrate the operating characteristics of the estimator via simulation and apply the method to a recent preventive malaria vaccine efficacy trial.
{"title":"Assessing trends in vaccine efficacy by pathogen genetic distance.","authors":"David Benkeser, Michal Juraska, Peter B Gilbert","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Preventive vaccines are an effective public health intervention for reducing the burden of infectious diseases, but have yet to be developed for several major infectious diseases. Vaccine sieve analysis studies whether and how the efficacy of a vaccine varies with the genetics of the infectious pathogen, which may help guide future vaccine development and deployment. A standard statistical approach to sieve analysis compares the effect of the vaccine to prevent infection and disease caused by pathogen types defined dichotomously as genetically near or far from a reference pathogen strain inside the vaccine construct. For example, near may be defined by amino acid identity at all amino acid positions considered in a multiple alignment and far defined by at least one amino acid difference. An alternative approach is to study the efficacy of the vaccine as a function of genetic distance from a pathogen to a reference vaccine strain where the distance cumulates over the set of amino acid positions. We propose a nonparametric method for estimating and testing the trend in the effect of a vaccine across genetic distance. We illustrate the operating characteristics of the estimator via simulation and apply the method to a recent preventive malaria vaccine efficacy trial.</p>","PeriodicalId":44492,"journal":{"name":"Journal of the SFdS","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7685316/pdf/nihms-1645265.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38646695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gilles Celeux, Marie-Laure Martin-Magniette, Cathy Maugis-Rabusseau, Adrian E Raftery
We compare two major approaches to variable selection in clustering: model selection and regularization. Based on previous results, we select the method of Maugis et al. (2009b), which modified the method of Raftery and Dean (2006), as a current state of the art model selection method. We select the method of Witten and Tibshirani (2010) as a current state of the art regularization method. We compared the methods by simulation in terms of their accuracy in both classification and variable selection. In the first simulation experiment all the variables were conditionally independent given cluster membership. We found that variable selection (of either kind) yielded substantial gains in classification accuracy when the clusters were well separated, but few gains when the clusters were close together. We found that the two variable selection methods had comparable classification accuracy, but that the model selection approach had substantially better accuracy in selecting variables. In our second simulation experiment, there were correlations among the variables given the cluster memberships. We found that the model selection approach was substantially more accurate in terms of both classification and variable selection than the regularization approach, and that both gave more accurate classifications than K-means without variable selection. But the model selection approach is not available in a very high dimension context.
我们比较了两种主要的聚类变量选择方法:模型选择和正则化。基于之前的结果,我们选择了Maugis et al. (2009b)的方法,该方法修改了Raftery和Dean(2006)的方法,作为当前最先进的模型选择方法。我们选择Witten和Tibshirani(2010)的方法作为当前最先进的正则化方法。通过仿真比较了两种方法在分类和变量选择上的准确性。在第一个仿真实验中,所有变量都是给定簇隶属度的条件独立变量。我们发现,当聚类很好地分离时,变量选择(任何一种)在分类精度上都有很大的提高,但当聚类靠近时,几乎没有提高。我们发现两种变量选择方法具有相当的分类精度,但模型选择方法在选择变量方面具有更好的准确性。在我们的第二个模拟实验中,给定集群成员的变量之间存在相关性。我们发现,模型选择方法在分类和变量选择方面都比正则化方法准确得多,并且两者都比没有变量选择的K-means给出更准确的分类。但是模型选择方法不适用于非常高维的上下文中。
{"title":"Comparing Model Selection and Regularization Approaches to Variable Selection in Model-Based Clustering.","authors":"Gilles Celeux, Marie-Laure Martin-Magniette, Cathy Maugis-Rabusseau, Adrian E Raftery","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We compare two major approaches to variable selection in clustering: model selection and regularization. Based on previous results, we select the method of Maugis et al. (2009b), which modified the method of Raftery and Dean (2006), as a current state of the art model selection method. We select the method of Witten and Tibshirani (2010) as a current state of the art regularization method. We compared the methods by simulation in terms of their accuracy in both classification and variable selection. In the first simulation experiment all the variables were conditionally independent given cluster membership. We found that variable selection (of either kind) yielded substantial gains in classification accuracy when the clusters were well separated, but few gains when the clusters were close together. We found that the two variable selection methods had comparable classification accuracy, but that the model selection approach had substantially better accuracy in selecting variables. In our second simulation experiment, there were correlations among the variables given the cluster memberships. We found that the model selection approach was substantially more accurate in terms of both classification and variable selection than the regularization approach, and that both gave more accurate classifications than <i>K</i>-means without variable selection. But the model selection approach is not available in a very high dimension context.</p>","PeriodicalId":44492,"journal":{"name":"Journal of the SFdS","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4178956/pdf/nihms-547507.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32716507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}