Regulation of gene expression is a vital component of neurological homeostasis. Cataloging the consequences of endogenous gene expression on the physical structure and connectivity of the brain offers a means of unifying trait-associated genetic variation with trait-associated neurological features. We perform tissue-specific transcriptome-wide association studies (TWASs) on over 3,400 neuroimaging phenotypes in the UK Biobank (N = 33,224) using our joint-tissue imputation (JTI)-TWAS method. We identify highly significant associations between predicted expression for 7,192 genes and a wide variety of measures of the brain derived from magnetic resonance imaging (MRI). Our approach generates reproducible results in internal and external replication datasets. Genetically determined expression alone is sufficient for high-fidelity reconstruction of brain structure and organization. We demonstrate complementary benefits of cross-tissue and single-tissue analyses toward an integrated neurobiology and provide evidence that gene expression outside the central nervous system provides unique insights into brain health. As an application, we provide evidence suggesting that the genetically regulated expression of schizophrenia risk genes causally affects over 73% of neurological phenotypes that are altered in individuals with schizophrenia (as identified by neuroimaging studies). Imaging features associated with neuropsychiatric traits can provide valuable insights into underlying pathophysiology. By linking neuroimaging-derived phenotypes with expression levels of specific genes, this resource represents a powerful gene prioritization schema that can improve our understanding of brain function, development, and disease. The use of multiple different cortical and subcortical atlases in the resource facilitates direct integration of these data with findings from a diverse range of clinical neuroimaging studies.
A health workforce capable of implementing genomic medicine requires effective genomics education. Genomics education interventions developed for health professions over the last two decades, and their impact, are variably described in the literature. To inform an evaluation framework for genomics education, we undertook an exploratory scoping review of published needs assessments for, and/or evaluations of, genomics education interventions for health professionals from 2000 to 2023. We retrieved and screened 4,659 records across the two searches with 363 being selected for full-text review and consideration by an interdisciplinary working group. 104 articles were selected for inclusion in the review-60 needs assessments, 52 genomics education evaluations, and eight describing both. Included articles spanned all years and described education interventions in over 30 countries. Target audiences included medical specialists, nurses/midwives, and/or allied health professionals. Evaluation questions, outcomes, and measures were extracted, categorized, and tabulated to iteratively compare measures across stages of genomics education evaluation: planning (pre-implementation), development and delivery (implementation), and impact (immediate, intermediate, or long-term outcomes). They are presented here along with descriptions of study designs. We document the wide variability in evaluation approaches and terminology used to define measures and note that few articles considered downstream (long-term) outcomes of genomics education interventions. Alongside the evaluation framework for genomics education, results from this scoping review form part of a toolkit to help educators to undertake rigorous genomics evaluation that is fit for purpose and can contribute to the growing evidence base of the contribution of genomics education in implementation strategies for genomic medicine.
Histone deacetylase 3 (HDAC3) is a crucial epigenetic modulator essential for various developmental and physiological functions. Although its dysfunction is increasingly recognized in abnormal phenotypes, to our knowledge, there have been no established reports of human diseases directly linked to HDAC3 dysfunction. Using trio exome sequencing and extensive phenotypic analysis, we correlated heterozygous de novo variants in HDAC3 with a neurodevelopmental disorder having variable clinical presentations, frequently associated with intellectual disability, developmental delay, epilepsy, and musculoskeletal abnormalities. In a cohort of six individuals, we identified missense variants in HDAC3 (c.277G>A [p.Asp93Asn], c.328G>A [p.Ala110Thr], c.601C>T [p.Pro201Ser], c. 797T>C [p.Leu266Ser], c.799G>A [p.Gly267Ser], and c.1075C>T [p.Arg359Cys]), all located in evolutionarily conserved sites and confirmed as de novo. Experimental studies identified defective deacetylation activity in the p.Asp93Asn, p.Pro201Ser, p.Leu266Ser, and p.Gly267Ser variants, positioned near the enzymatic pocket. In addition, proteomic analysis employing co-immunoprecipitation revealed that the disrupted interactions with molecules involved in the CoREST and NCoR complexes, particularly in the p.Ala110Thr variant, consist of a central pathogenic mechanism. Moreover, immunofluorescence analysis showed diminished nuclear to cytoplasmic fluorescence ratio in the p.Ala110Thr, p.Gly267Ser, and p.Arg359Cys variants, indicating impaired nuclear localization. Taken together, our study highlights that de novo missense variants in HDAC3 are associated with a broad spectrum of neurodevelopmental disorders, which emphasizes the complex role of HDAC3 in histone deacetylase activity, multi-protein complex interactions, and nuclear localization for proper physiological functions. These insights open new avenues for understanding the molecular mechanisms of HDAC3-related disorders and may inform future therapeutic strategies.
In Mendelian randomization, two single SNP-trait correlation-based methods have been developed to infer the causal direction between an exposure (e.g., a gene) and an outcome (e.g., a trait), called MR Steiger's method and its recent extension called Causal Direction-Ratio (CD-Ratio). Here we propose an approach based on R2, the coefficient of determination, to combine information from multiple (possibly correlated) SNPs to simultaneously infer the presence and direction of a causal relationship between an exposure and an outcome. Our proposed method generalizes Steiger's method from using a single SNP to multiple SNPs as IVs. It is especially useful in transcriptome-wide association studies (TWASs) (and similar applications) with typically small sample sizes for gene expression (or another molecular trait) data, providing a more flexible and powerful approach to inferring causal directions. It can be applied to GWAS summary data with a reference panel. We also discuss the influence of invalid IVs and introduce a new approach called R2S to select and remove invalid IVs (if any) to enhance the robustness. We compared the performance of the proposed method with existing methods in simulations to demonstrate its advantages. We applied the methods to identify causal genes for high/low-density lipoprotein cholesterol (HDL/LDL) using the individual-level GTEx gene expression data and UK Biobank GWAS data. The proposed method was able to confirm some well-known causal genes while identifying some novel ones. Additionally, we illustrated an application of the proposed method to GWAS summary to infer causal relationships between HDL/LDL and stroke/coronary artery disease (CAD).
Understanding the impact of splicing and nonsense variants on RNA is crucial for the resolution of variant classification as well as their suitability for precision medicine interventions. This is primarily enabled through RNA studies involving transcriptomics followed by targeted assays using RNA isolated from clinically accessible tissues (CATs) such as blood or skin of affected individuals. Insufficient disease gene expression in CATs does however pose a major barrier to RNA based investigations, which we show is relevant to 1,436 Mendelian disease genes. We term these "silent" Mendelian genes (SMGs), the largest portion (36%) of which are associated with neurological disorders. We developed two approaches to induce SMG expression in human dermal fibroblasts (HDFs) to overcome this limitation, including CRISPR-activation-based gene transactivation and fibroblast-to-neuron transdifferentiation. Initial transactivation screens involving 40 SMGs stimulated our development of a highly multiplexed transactivation system culminating in the 6- to 90,000-fold induction of expression of 20/20 (100%) SMGs tested in HDFs. Transdifferentiation of HDFs directly to neurons led to expression of 193/516 (37.4%) of SMGs implicated in neurological disease. The magnitude and isoform diversity of SMG expression following either transactivation or transdifferentiation was comparable to clinically relevant tissues. We apply transdifferentiation and/or gene transactivation combined with short- and long-read RNA sequencing to investigate the impact that variants in USH2A, SCN1A, DMD, and PAK3 have on RNA using HDFs derived from affected individuals. Transactivation and transdifferentiation represent rapid, scalable functional genomic solutions to investigate variants impacting SMGs in the patient cell and genomic context.
Pathogenic variants in the JAG1 gene are a primary cause of the multi-system disorder Alagille syndrome. Although variant detection rates are high for this disease, there is uncertainty associated with the classification of missense variants that leads to reduced diagnostic yield. Consequently, up to 85% of reported JAG1 missense variants have uncertain or conflicting classifications. We generated a library of 2,832 JAG1 nucleotide variants within exons 1-7, a region with a high number of reported missense variants, and designed a high-throughput assay to measure JAG1 membrane expression, a requirement for normal function. After calibration using a set of 175 known or predicted pathogenic and benign variants included within the variant library, 486 variants were characterized as functionally abnormal (n = 277 abnormal and n = 209 likely abnormal), of which 439 (90.3%) were missense. We identified divergent membrane expression occurring at specific residues, indicating that loss of the wild-type residue itself does not drive pathogenicity, a finding supported by structural modeling data and with broad implications for clinical variant classification both for Alagille syndrome and globally across other disease genes. Of 144 uncertain variants reported in patients undergoing clinical or research testing, 27 had functionally abnormal membrane expression, and inclusion of our data resulted in the reclassification of 26 to likely pathogenic. Functional evidence augments the classification of genomic variants, reducing uncertainty and improving diagnostics. Inclusion of this repository of functional evidence during JAG1 variant reclassification will significantly affect resolution of variant pathogenicity, making a critical impact on the molecular diagnosis of Alagille syndrome.
Mendelian randomization (MR), which utilizes genetic variants as instrumental variables (IVs), has gained popularity as a method for causal inference between phenotypes using genetic data. While efforts have been made to relax IV assumptions and develop new methods for causal inference in the presence of invalid IVs due to confounding, the reliability of MR methods in real-world applications remains uncertain. Instead of using simulated datasets, we conducted a benchmark study evaluating 16 two-sample summary-level MR methods using real-world genetic datasets to provide guidelines for the best practices. Our study focused on the following crucial aspects: type I error control in the presence of various confounding scenarios (e.g., population stratification, pleiotropy, and family-level confounders like assortative mating), the accuracy of causal effect estimates, replicability, and power. By comprehensively evaluating the performance of compared methods over one thousand exposure-outcome trait pairs, our study not only provides valuable insights into the performance and limitations of the compared methods but also offers practical guidance for researchers to choose appropriate MR methods for causal inference.
Joint association analysis of multiple traits with multiple genetic variants can provide insight into genetic architecture and pleiotropy, improve trait prediction, and increase power for detecting association. Furthermore, some traits are naturally high-dimensional, e.g., images, networks, or longitudinally measured traits. Assessing significance for multitrait genetic association can be challenging, especially when the sample has population sub-structure and/or related individuals. Failure to adequately adjust for sample structure can lead to power loss and inflated type 1 error, and commonly used methods for assessing significance can work poorly with a large number of traits or be computationally slow. We developed JASPER, a fast, powerful, robust method for assessing significance of multitrait association with a set of genetic variants, in samples that have population sub-structure, admixture, and/or relatedness. In simulations, JASPER has higher power, better type 1 error control, and faster computation than existing methods, with the power and speed advantage of JASPER increasing with the number of traits. JASPER is potentially applicable to a wide range of association testing applications, including for multiple disease traits, expression traits, image-derived traits, and microbiome abundances. It allows for covariates, ascertainment, and rare variants and is robust to phenotype model misspecification. We apply JASPER to analyze gene expression in the Framingham Heart Study, where, compared to alternative approaches, JASPER finds more significant associations, including several that indicate pleiotropic effects, most of which replicate previous results, while others have not previously been reported. Our results demonstrate the promise of JASPER for powerful multitrait analysis in structured samples.
Allele-specific expression plays a crucial role in unraveling various biological mechanisms, including genomic imprinting and gene expression controlled by cis-regulatory variants. However, existing methods for quantification from RNA-sequencing (RNA-seq) reads do not adequately and efficiently remove various allele-specific read mapping biases, such as reference bias arising from reads containing the alternative allele that do not map to the reference transcriptome or ambiguous mapping bias caused by reads containing the reference allele that map differently from reads containing the alternative allele. We present Ornaments, a computational tool for rapid and accurate estimation of allele-specific transcript expression at unphased heterozygous loci from RNA-seq reads while correcting for allele-specific read mapping biases. Ornaments removes reference bias by mapping reads to a personalized transcriptome and ambiguous mapping bias by probabilistically assigning reads to multiple transcripts and variant loci they map to. Ornaments is a lightweight extension of kallisto, a popular tool for fast RNA-seq quantification, that improves the efficiency and accuracy of WASP, a popular tool for bias correction in allele-specific read mapping. In experiments with simulated and human lymphoblastoid cell-line RNA-seq reads with the genomes of the 1000 Genomes Project, we demonstrate that Ornaments improves the accuracy of WASP and kallisto, is nearly as efficient as kallisto, and is an order of magnitude faster than WASP per sample, with the additional cost of constructing a personalized index for multiple samples. Additionally, we show that Ornaments finds imprinted transcripts with higher sensitivity than WASP, which detects imprinted signals only at gene level.