Background: Dichotomization using the lower quartile as cutoff is commonly used for harmonizing heterogeneous physical activity (PA) measures across studies. However, this may create misclassification and hinder discovery of new loci.
Objectives: This study aimed to evaluate the performance of selecting individuals from the extremes of the exposure (SIEE) as an alternative approach to reduce such misclassification.
Method: For systolic and diastolic blood pressure in the Framingham Heart Study, we performed a genome-wide association study with gene-PA interaction analysis using three PA variables derived by SIEE and two other dichotomization approaches. We compared number of loci detected and overlap with loci found using a quantitative PA variable. In addition, we performed simulation studies to assess bias, false discovery rates (FDR), and power under synergistic/antagonistic genetic effects in exposure groups and in the presence/absence of measurement error.
Results: In the empirical analysis, SIEE's performance was neither the best nor the worst. In most simulation scenarios, SIEE was consistently outperformed in terms of FDR and power. Particularly, in a scenario characterized by antagonistic effects and measurement error, SIEE had the least bias and highest power.
Conclusion: SIEE's promise appears limited to detecting loci with antagonistic effects. Further studies are needed to evaluate SIEE's full advantage.
Genotype imputation is a process of estimating missing ge-notypes from the haplotype or genotype reference panel. It can effectively boost the power of detecting single nucleotide polymorphisms (SNPs) in genome-wide association studies, integrate multi-studies for meta-analysis, and be applied in fine-mapping studies. The performance of genotype imputation is affected by many factors, including software, reference selection, sample size, and SNP density/sequencing coverage. A systematical evaluation of the imputation performance of current popular software will benefit future studies. Here, we evaluate imputation performances of Beagle4.1, IMPUTE2, MACH+Minimac3, and SHAPEIT2+ IM-PUTE2 using test samples of East Asian ancestry and references of the 1000 Genomes Project. The result indicated the accuracy of IMPUTE2 (99.18%) is slightly higher than that of the others (Beagle4.1: 98.94%, MACH+Minimac3: 98.51%, and SHAPEIT2+IMPUTE2: 99.08%). To achieve good and stable imputation quality, the minimum requirement of SNP density needs to be > 200/Mb. The imputation accuracies of IMPUTE2 and Beagle4.1 were under the minor influence of the study sample size. The contribution extent of reference to genotype imputation performance relied on software selection. We assessed the imputation performance on SNPs generated by next-generation whole genome sequencing and found that SNP sets detected by sequencing with 15× depth could be mostly got by imputing from the haplotype reference panel of the 1000 Genomes Project based on SNP data detected by sequencing with 4× depth. All of the imputation software had a weaker performance in low minor allele frequency SNP regions because of the bias of reference or software. In the future, more comprehensive reference panels or new algorithm developments may rise up to this challenge.
Background: The variants identified in genome-wide association studies account for only a small fraction of disease heritability. A key to this "missing heritability" is believed to be rare variants. Specifically, we focus on rare haplotype variant (rHTV). The existing methods for detecting rHTV are mostly population-based, and as such, are susceptible to population stratification and admixture, leading to an inflated false-positive rate. Family-based methods are more robust in this respect.
Methods: We propose a method for detecting rHTVs associated with quantitative traits called family-based quantitative Bayesian LASSO (famQBL). FamQBL can analyze any type of pedigree and is based on a mixed model framework. We regularize the haplotype effects using Bayesian LASSO and estimate the posterior distributions using Markov chain Monte Carlo methods.
Results: We conduct simulation studies, including analyses of Genetic Analysis Workshop 18 simulated data, to study the properties of famQBL and compare with a standard family-based haplotype association test implemented in FBAT (family-based association test) software. We find famQBL to be more powerful than FBAT with well-controlled false-positive rates. We also apply famQBL to the Framingham Heart Study data and detect an rHTV associated with diastolic blood pressure.
Conclusion: FamQBL can help uncover rHTVs associated with quantitative traits.
Objectives: Classical methods for combining summary data from genome-wide association studies only use marginal genetic effects, and power can be compromised in the presence of heterogeneity. We aim to enhance the discovery of novel associated loci in the presence of heterogeneity of genetic effects in subgroups defined by an environmental factor.
Methods: We present a pvalue-assisted subset testing for associations (pASTA) framework that generalizes the previously proposed association analysis based on subsets (ASSET) method by incorporating gene-environment (G-E) interactions into the testing procedure. We conduct simulation studies and provide two data examples.
Results: Simulation studies show that our proposal is more powerful than methods based on marginal associations in the presence of G-E interactions and maintains comparable power even in their absence. Both data examples demonstrate that our method can increase power to detect overall genetic associations and identify novel studies/phenotypes that contribute to the association.
Conclusions: Our proposed method can be a useful screening tool to identify candidate single nucleotide polymorphisms that are potentially associated with the trait(s) of interest for further validation. It also allows researchers to determine the most probable subset of traits that exhibit genetic associations in addition to the enhancement of power.
Objectives: There is evidence to suggest that asthma pathogenesis is affected by both genetic and epigenetic variation independently, and there is some evidence to suggest that genetic-epigenetic interactions affect risk of asthma. However, little research has been done to identify such interactions on a genome-wide scale. The aim of this studies was to identify genes with genetic-epigenetic interactions associated with asthma.
Methods: Using asthma case-control data, we applied a novel nonparametric gene-centric approach to test for interactions between multiple SNPs and CpG sites simultaneously in the vicinities of 18,178 genes across the genome.
Results: Twelve genes, PF4, ATF3, TPRA1, HOPX, SCARNA18, STC1, OR10K1, UPK1B, LOC101928523, LHX6, CHMP4B, and LANCL1, exhibited statistically significant SNP-CpG interactions (false discovery rate = 0.05). Of these, three have previously been implicated in asthma risk (PF4, ATF3, and TPRA1). Follow-up analysis revealed statistically significant pairwise SNP-CpG interactions for several of these genes, including SCARNA18, LHX6, and LOC101928523 (p = 1.33E-04, 8.21E-04, 1.11E-03, respectively).
Conclusions: Joint effects of genetic and epigenetic variation may play an important role in asthma pathogenesis. Statistical methods that simultaneously account for multiple variations across chromosomal regions may be needed to detect these types of effects on a genome-wide scale.
Objectives: An interesting consequence of consanguinity is that the inbred singleton becomes informative for genetic variance. We determine the contribution of an inbred singleton to variance component analysis of heritability and linkage.
Methods: Statistical theory for the power of variance component analysis of quantitative traits is used to determine the expected contribution of an inbred singleton to likelihood-ratio tests of heritability and linkage.
Results: In variance component models, an inbred singleton contributes relatively little to a test of heritability but can contribute substantively to a test of linkage. For small-to-moderate quantitative trait locus (QTL) effects and a level of inbreeding comparable to matings between first cousins (the preferred form of union in many human populations), an inbred singleton can carry nearly 25% of the information of a non-inbred sib pair. In more highly inbred contexts available with experimental animal populations, nonhuman primate colonies, and some human subpopulations, the contribution of an inbred singleton relative to a sib pair can exceed 50%.
Conclusions: Inbred individuals, even in isolation from other members of a sample, can contribute to variance component estimation and tests of heritability and linkage. Under certain conditions, the informativeness of the inbred singleton can approach that of a non-inbred sib pair.