Journal of the SFdS最新文献

英文中文

Identification in Causal Models With Hidden Variables. 带隐变量的因果模型辨识。

Journal of the SFdS

Pub Date : 2020-07-01 Epub Date: 2020-06-30

Ilya Shpitser

Targets of inference that establish causality are phrased in terms of counterfactual responses to interventions. These potential outcomes operationalize cause effect relationships by means of comparisons of cases and controls in hypothetical randomized controlled experiments. In many applied settings, data on such experiments is not directly available, necessitating assumptions linking the counterfactual target of inference with the factual observed data distribution. This link is provided by causal models. Originally defined on potential outcomes directly (Rubin, 1976), causal models have been extended to longitudinal settings (Robins, 1986), and reformulated as graphical models (Spirtes et al., 2001; Pearl, 2009). In settings where common causes of all observed variables are themselves observed, many causal inference targets are identified via variations of the expression referred to in the literature as the g-formula (Robins, 1986), the manipulated distribution (Spirtes et al., 2001), or the truncated factorization (Pearl, 2009). In settings where hidden variables are present, identification results become considerably more complicated. In this manuscript, we review identification theory in causal models with hidden variables for common targets that arise in causal inference applications, including causal effects, direct, indirect, and path-specific effects, and outcomes of dynamic treatment regimes. We will describe a simple formulation of this theory (Tian and Pearl, 2002; Shpitser and Pearl, 2006b,a; Tian, 2008; Shpitser, 2013) in terms of causal graphical models, and the fixing operator, a statistical analogue of the intervention operation (Richardson et al., 2017).

建立因果关系的推理目标是根据对干预的反事实反应来表述的。这些潜在结果通过在假设的随机对照实验中对病例和对照进行比较来实现因果关系的操作性。在许多应用环境中，此类实验的数据不能直接获得，因此需要将反事实的推理目标与事实观察到的数据分布联系起来。这种联系是由因果模型提供的。最初直接定义潜在结果(Rubin, 1976)，因果模型已经扩展到纵向设置(Robins, 1986)，并重新制定为图形模型(Spirtes等人，2001;珍珠,2009)。在观察到所有被观察变量的共同原因的情况下，通过文献中提到的g公式(Robins, 1986)、操纵分布(Spirtes et al.， 2001)或截断因子分解(Pearl, 2009)等表达式的变化来确定许多因果推理目标。在存在隐藏变量的设置中，识别结果变得相当复杂。在本文中，我们回顾了在因果推理应用中出现的具有隐藏变量的因果模型中的识别理论，包括因果效应、直接效应、间接效应和路径特异性效应，以及动态治疗方案的结果。我们将描述这一理论的一个简单公式(Tian and Pearl, 2002;Shpitser and Pearl, 2006b,a;田,2008;Shpitser, 2013)在因果图模型方面，以及固定算子，干预操作的统计模拟(Richardson等，2017)。

{"title":"Identification in Causal Models With Hidden Variables.","authors":"Ilya Shpitser","doi":"","DOIUrl":"","url":null,"abstract":"Targets of inference that establish causality are phrased in terms of counterfactual responses to interventions. These potential outcomes operationalize cause effect relationships by means of comparisons of cases and controls in hypothetical randomized controlled experiments. In many applied settings, data on such experiments is not directly available, necessitating assumptions linking the counterfactual target of inference with the factual observed data distribution. This link is provided by causal models. Originally defined on potential outcomes directly (Rubin, 1976), causal models have been extended to longitudinal settings (Robins, 1986), and reformulated as graphical models (Spirtes et al., 2001; Pearl, 2009). In settings where common causes of all observed variables are themselves observed, many causal inference targets are identified via variations of the expression referred to in the literature as the g-formula (Robins, 1986), the manipulated distribution (Spirtes et al., 2001), or the truncated factorization (Pearl, 2009). In settings where hidden variables are present, identification results become considerably more complicated. In this manuscript, we review identification theory in causal models with hidden variables for common targets that arise in causal inference applications, including causal effects, direct, indirect, and path-specific effects, and outcomes of dynamic treatment regimes. We will describe a simple formulation of this theory (Tian and Pearl, 2002; Shpitser and Pearl, 2006b,a; Tian, 2008; Shpitser, 2013) in terms of causal graphical models, and the fixing operator, a statistical analogue of the intervention operation (Richardson et al., 2017).","PeriodicalId":44492,"journal":{"name":"Journal of the SFdS","volume":"161 1","pages":"91-119"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7685307/pdf/nihms-1063757.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38654029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Assessing trends in vaccine efficacy by pathogen genetic distance. 利用病原体遗传距离评估疫苗效力趋势。

Journal of the SFdS

Pub Date : 2020-07-01

David Benkeser, Michal Juraska, Peter B Gilbert

Preventive vaccines are an effective public health intervention for reducing the burden of infectious diseases, but have yet to be developed for several major infectious diseases. Vaccine sieve analysis studies whether and how the efficacy of a vaccine varies with the genetics of the infectious pathogen, which may help guide future vaccine development and deployment. A standard statistical approach to sieve analysis compares the effect of the vaccine to prevent infection and disease caused by pathogen types defined dichotomously as genetically near or far from a reference pathogen strain inside the vaccine construct. For example, near may be defined by amino acid identity at all amino acid positions considered in a multiple alignment and far defined by at least one amino acid difference. An alternative approach is to study the efficacy of the vaccine as a function of genetic distance from a pathogen to a reference vaccine strain where the distance cumulates over the set of amino acid positions. We propose a nonparametric method for estimating and testing the trend in the effect of a vaccine across genetic distance. We illustrate the operating characteristics of the estimator via simulation and apply the method to a recent preventive malaria vaccine efficacy trial.

预防性疫苗是减少传染病负担的有效公共卫生干预措施，但对一些主要传染病尚未开发。疫苗筛选分析研究疫苗的效力是否以及如何随传染性病原体的遗传而变化，这可能有助于指导未来疫苗的开发和部署。筛选分析的标准统计方法比较疫苗预防感染和由病原体类型引起的疾病的效果，这些病原体类型被分为两种，即在基因上接近或远离疫苗结构内的参考病原体菌株。例如，近可由在多重比对中考虑的所有氨基酸位置上的氨基酸同一性来定义，远可由至少一个氨基酸差异来定义。另一种方法是研究疫苗的功效，作为从病原体到参考疫苗菌株的遗传距离的函数，其中距离累积在一组氨基酸位置上。我们提出了一种非参数方法来估计和测试疫苗在遗传距离上的影响趋势。我们通过模拟说明了估计器的工作特性，并将该方法应用于最近的预防性疟疾疫苗功效试验。

{"title":"Assessing trends in vaccine efficacy by pathogen genetic distance.","authors":"David Benkeser, Michal Juraska, Peter B Gilbert","doi":"","DOIUrl":"","url":null,"abstract":"Preventive vaccines are an effective public health intervention for reducing the burden of infectious diseases, but have yet to be developed for several major infectious diseases. Vaccine sieve analysis studies whether and how the efficacy of a vaccine varies with the genetics of the infectious pathogen, which may help guide future vaccine development and deployment. A standard statistical approach to sieve analysis compares the effect of the vaccine to prevent infection and disease caused by pathogen types defined dichotomously as genetically near or far from a reference pathogen strain inside the vaccine construct. For example, near may be defined by amino acid identity at all amino acid positions considered in a multiple alignment and far defined by at least one amino acid difference. An alternative approach is to study the efficacy of the vaccine as a function of genetic distance from a pathogen to a reference vaccine strain where the distance cumulates over the set of amino acid positions. We propose a nonparametric method for estimating and testing the trend in the effect of a vaccine across genetic distance. We illustrate the operating characteristics of the estimator via simulation and apply the method to a recent preventive malaria vaccine efficacy trial.","PeriodicalId":44492,"journal":{"name":"Journal of the SFdS","volume":"161 1","pages":"164-175"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7685316/pdf/nihms-1645265.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38646695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Comparing Model Selection and Regularization Approaches to Variable Selection in Model-Based Clustering. 基于模型聚类中变量选择的模型选择和正则化方法的比较。

Journal of the SFdS

Pub Date : 2014-01-01

Gilles Celeux, Marie-Laure Martin-Magniette, Cathy Maugis-Rabusseau, Adrian E Raftery

We compare two major approaches to variable selection in clustering: model selection and regularization. Based on previous results, we select the method of Maugis et al. (2009b), which modified the method of Raftery and Dean (2006), as a current state of the art model selection method. We select the method of Witten and Tibshirani (2010) as a current state of the art regularization method. We compared the methods by simulation in terms of their accuracy in both classification and variable selection. In the first simulation experiment all the variables were conditionally independent given cluster membership. We found that variable selection (of either kind) yielded substantial gains in classification accuracy when the clusters were well separated, but few gains when the clusters were close together. We found that the two variable selection methods had comparable classification accuracy, but that the model selection approach had substantially better accuracy in selecting variables. In our second simulation experiment, there were correlations among the variables given the cluster memberships. We found that the model selection approach was substantially more accurate in terms of both classification and variable selection than the regularization approach, and that both gave more accurate classifications than K-means without variable selection. But the model selection approach is not available in a very high dimension context.

我们比较了两种主要的聚类变量选择方法:模型选择和正则化。基于之前的结果，我们选择了Maugis et al. (2009b)的方法，该方法修改了Raftery和Dean(2006)的方法，作为当前最先进的模型选择方法。我们选择Witten和Tibshirani(2010)的方法作为当前最先进的正则化方法。通过仿真比较了两种方法在分类和变量选择上的准确性。在第一个仿真实验中，所有变量都是给定簇隶属度的条件独立变量。我们发现，当聚类很好地分离时，变量选择(任何一种)在分类精度上都有很大的提高，但当聚类靠近时，几乎没有提高。我们发现两种变量选择方法具有相当的分类精度，但模型选择方法在选择变量方面具有更好的准确性。在我们的第二个模拟实验中，给定集群成员的变量之间存在相关性。我们发现，模型选择方法在分类和变量选择方面都比正则化方法准确得多，并且两者都比没有变量选择的K-means给出更准确的分类。但是模型选择方法不适用于非常高维的上下文中。

{"title":"Comparing Model Selection and Regularization Approaches to Variable Selection in Model-Based Clustering.","authors":"Gilles Celeux, Marie-Laure Martin-Magniette, Cathy Maugis-Rabusseau, Adrian E Raftery","doi":"","DOIUrl":"","url":null,"abstract":"We compare two major approaches to variable selection in clustering: model selection and regularization. Based on previous results, we select the method of Maugis et al. (2009b), which modified the method of Raftery and Dean (2006), as a current state of the art model selection method. We select the method of Witten and Tibshirani (2010) as a current state of the art regularization method. We compared the methods by simulation in terms of their accuracy in both classification and variable selection. In the first simulation experiment all the variables were conditionally independent given cluster membership. We found that variable selection (of either kind) yielded substantial gains in classification accuracy when the clusters were well separated, but few gains when the clusters were close together. We found that the two variable selection methods had comparable classification accuracy, but that the model selection approach had substantially better accuracy in selecting variables. In our second simulation experiment, there were correlations among the variables given the cluster memberships. We found that the model selection approach was substantially more accurate in terms of both classification and variable selection than the regularization approach, and that both gave more accurate classifications than K-means without variable selection. But the model selection approach is not available in a very high dimension context.","PeriodicalId":44492,"journal":{"name":"Journal of the SFdS","volume":"155 2","pages":"57-71"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4178956/pdf/nihms-547507.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32716507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Journal of the SFdS

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀