Siyi Chen, Zhaotong Lin, Xiaotong Shen, Ling Li, Wei Pan
{"title":"Inference of causal metabolite networks in the presence of invalid instrumental variables with GWAS summary data","authors":"Siyi Chen, Zhaotong Lin, Xiaotong Shen, Ling Li, Wei Pan","doi":"10.1002/gepi.22535","DOIUrl":null,"url":null,"abstract":"<p>We propose structural equation models (SEMs) as a general framework to infer causal networks for metabolites and other complex traits. Traditionally SEMs are used only for individual-level data under the assumption that all instrumental variables (IVs) are valid. To overcome these limitations, we propose both one- and two-sample approaches for causal network inference based on SEMs that can: (1) perform causal analysis and discover causal relationships among multiple traits; (2) account for the possible presence of some invalid IVs; (3) allow for data analysis using only genome-wide association studies (GWAS) summary statistics when individual-level data are not available; (4) consider the possibility of bidirectional relationships between traits. Our method employs a simple stepwise selection to identify invalid IVs, thus avoiding false positives while possibly increasing true discoveries based on two-stage least squares (2SLS). We use both real GWAS data and simulated data to demonstrate the superior performance of our method over the standard 2SLS/SEMs. For real data analysis, our proposed approach is applied to a human blood metabolite GWAS summary data set to uncover putative causal relationships among the metabolites; we also identify some metabolites (putative) causal to Alzheimer's disease (AD), which, along with the inferred causal metabolite network, suggest some possible pathways of metabolites involved in AD.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 8","pages":"585-599"},"PeriodicalIF":1.7000,"publicationDate":"2023-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22535","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetic Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/gepi.22535","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0
Abstract
We propose structural equation models (SEMs) as a general framework to infer causal networks for metabolites and other complex traits. Traditionally SEMs are used only for individual-level data under the assumption that all instrumental variables (IVs) are valid. To overcome these limitations, we propose both one- and two-sample approaches for causal network inference based on SEMs that can: (1) perform causal analysis and discover causal relationships among multiple traits; (2) account for the possible presence of some invalid IVs; (3) allow for data analysis using only genome-wide association studies (GWAS) summary statistics when individual-level data are not available; (4) consider the possibility of bidirectional relationships between traits. Our method employs a simple stepwise selection to identify invalid IVs, thus avoiding false positives while possibly increasing true discoveries based on two-stage least squares (2SLS). We use both real GWAS data and simulated data to demonstrate the superior performance of our method over the standard 2SLS/SEMs. For real data analysis, our proposed approach is applied to a human blood metabolite GWAS summary data set to uncover putative causal relationships among the metabolites; we also identify some metabolites (putative) causal to Alzheimer's disease (AD), which, along with the inferred causal metabolite network, suggest some possible pathways of metabolites involved in AD.
期刊介绍:
Genetic Epidemiology is a peer-reviewed journal for discussion of research on the genetic causes of the distribution of human traits in families and populations. Emphasis is placed on the relative contribution of genetic and environmental factors to human disease as revealed by genetic, epidemiological, and biologic investigations.
Genetic Epidemiology primarily publishes papers in statistical genetics, a research field that is primarily concerned with development of statistical, bioinformatical, and computational models for analyzing genetic data. Incorporation of underlying biology and population genetics into conceptual models is favored. The Journal seeks original articles comprising either applied research or innovative statistical, mathematical, computational, or genomic methodologies that advance studies in genetic epidemiology. Other types of reports are encouraged, such as letters to the editor, topic reviews, and perspectives from other fields of research that will likely enrich the field of genetic epidemiology.