The main goal of fine-mapping is the identification of relevant genetic variants that have a causal effect on some trait of interest, such as the presence of a disease. From a statistical point of view, fine mapping can be seen as a variable selection problem. Fine-mapping methods are often challenging to apply because of the presence of linkage disequilibrium (LD), that is, regions of the genome where the variants interrogated have high correlation. Several methods have been proposed to address this issue. Here we explore the ‘Sum of Single Effects’ (SuSiE) method, applied to real data (summary statistics) from a genome-wide meta-analysis of the autoimmune liver disease primary biliary cholangitis (PBC). Fine-mapping in this data set was previously performed using the FINEMAP program; we compare these previous results with those obtained from SuSiE, which provides an arguably more convenient and principled way of generating ‘credible sets’, that is set of predictors that are correlated with the response variable. This allows us to appropriately acknowledge the uncertainty when selecting the causal effects for the trait. We focus on the results from SuSiE-RSS, which fits the SuSiE model to summary statistics, such as z-scores, along with a correlation matrix. We also compare the SuSiE results to those obtained using a more recently developed method, h2-D2, which uses the same inputs. Overall, we find the results from SuSiE-RSS and, to a lesser extent, h2-D2, to be quite concordant with those previously obtained using FINEMAP. The resulting genes and biological pathways implicated are therefore also similar to those previously obtained, providing valuable confirmation of these previously reported results. Detailed examination of the credible sets identified suggests that, although for the majority of the loci (33 out of 56) the results from SuSiE-RSS seem most plausible, there are some loci (5 out of 56 loci) where the results from h2-D2 seem more compelling. Computer simulations suggest that, overall, SuSiE-RSS generally has slightly higher power, better precision, and better ability to identify the true number of causal variants in a region than h2-D2, although there are some scenarios where the power of h2-D2 is higher. Thus, in real data analysis, the use of complementary approaches such as both SuSiE and h2-D2 is potentially warranted.
{"title":"Fine-Mapping the Results From Genome-Wide Association Studies of Primary Biliary Cholangitis Using SuSiE and h2-D2","authors":"Aida Gjoka, Heather J. Cordell","doi":"10.1002/gepi.22592","DOIUrl":"10.1002/gepi.22592","url":null,"abstract":"<p>The main goal of fine-mapping is the identification of relevant genetic variants that have a causal effect on some trait of interest, such as the presence of a disease. From a statistical point of view, fine mapping can be seen as a variable selection problem. Fine-mapping methods are often challenging to apply because of the presence of linkage disequilibrium (LD), that is, regions of the genome where the variants interrogated have high correlation. Several methods have been proposed to address this issue. Here we explore the ‘Sum of Single Effects’ (SuSiE) method, applied to real data (summary statistics) from a genome-wide meta-analysis of the autoimmune liver disease primary biliary cholangitis (PBC). Fine-mapping in this data set was previously performed using the FINEMAP program; we compare these previous results with those obtained from SuSiE, which provides an arguably more convenient and principled way of generating ‘credible sets’, that is set of predictors that are correlated with the response variable. This allows us to appropriately acknowledge the uncertainty when selecting the causal effects for the trait. We focus on the results from SuSiE-RSS, which fits the SuSiE model to summary statistics, such as z-scores, along with a correlation matrix. We also compare the SuSiE results to those obtained using a more recently developed method, h2-D2, which uses the same inputs. Overall, we find the results from SuSiE-RSS and, to a lesser extent, h2-D2, to be quite concordant with those previously obtained using FINEMAP. The resulting genes and biological pathways implicated are therefore also similar to those previously obtained, providing valuable confirmation of these previously reported results. Detailed examination of the credible sets identified suggests that, although for the majority of the loci (33 out of 56) the results from SuSiE-RSS seem most plausible, there are some loci (5 out of 56 loci) where the results from h2-D2 seem more compelling. Computer simulations suggest that, overall, SuSiE-RSS generally has slightly higher power, better precision, and better ability to identify the true number of causal variants in a region than h2-D2, although there are some scenarios where the power of h2-D2 is higher. Thus, in real data analysis, the use of complementary approaches such as both SuSiE and h2-D2 is potentially warranted.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"49 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22592","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142380594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many statistical genetics analysis methods make use of GWAS summary statistics. Best statistical practice requires evaluating these methods in realistic simulation experiments. However, simulating summary statistics by first simulating individual genotype and phenotype data is extremely computationally demanding. This high cost may force researchers to conduct overly simplistic simulations that fail to accurately measure method performance. Alternatively, summary statistics can be simulated directly from their theoretical distribution. Although this is a common need among statistical genetics researchers, no software packages exist for comprehensive GWAS summary statistic simulation. We present GWASBrewer, an open source R package for direct simulation of GWAS summary statistics. We show that statistics simulated by GWASBrewer have the same distribution as statistics generated from individual level data, and can be produced at a fraction of the computational expense. Additionally, GWASBrewer can simulate standard error estimates, something that is typically not done when sampling summary statistics directly. GWASBrewer is highly flexible, allowing the user to simulate data for multiple traits connected by causal effects and with complex distributions of effect sizes. We demonstrate example uses of GWASBrewer for evaluating Mendelian randomization, polygenic risk score, and heritability estimation methods.
{"title":"GWASBrewer: An R Package for Simulating Realistic GWAS Summary Statistics","authors":"Jean Morrison","doi":"10.1002/gepi.22594","DOIUrl":"10.1002/gepi.22594","url":null,"abstract":"<p>Many statistical genetics analysis methods make use of GWAS summary statistics. Best statistical practice requires evaluating these methods in realistic simulation experiments. However, simulating summary statistics by first simulating individual genotype and phenotype data is extremely computationally demanding. This high cost may force researchers to conduct overly simplistic simulations that fail to accurately measure method performance. Alternatively, summary statistics can be simulated directly from their theoretical distribution. Although this is a common need among statistical genetics researchers, no software packages exist for comprehensive GWAS summary statistic simulation. We present <span>GWASBrewer</span>, an open source R package for direct simulation of GWAS summary statistics. We show that statistics simulated by \u0000<span>GWASBrewer</span> have the same distribution as statistics generated from individual level data, and can be produced at a fraction of the computational expense. Additionally, \u0000<span>GWASBrewer</span> can simulate standard error estimates, something that is typically not done when sampling summary statistics directly. \u0000<span>GWASBrewer</span> is highly flexible, allowing the user to simulate data for multiple traits connected by causal effects and with complex distributions of effect sizes. We demonstrate example uses of \u0000<span>GWASBrewer</span> for evaluating Mendelian randomization, polygenic risk score, and heritability estimation methods.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"49 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22594","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142380595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}