Genomic sequencing is essential for both biomedical research and clinical practice. While single-cell RNA sequencing (scRNA-seq) provides insights into biological processes at the cellular level, bulk RNA sequencing remains widely used for its scalability and cost-effectiveness. To explore biological heterogeneity, research efforts have been made toward inferring single-cell-like cellular compositions from bulk samples, i.e., deconvolving bulk samples into multiple cell types. However, existing deconvolution methods face two major limitations: (1) reliance on predefined gene signature matrices without accounting for inter-sample variability and (2) susceptibility to noise within biological systems. Here, we propose a cellular-component analysis (CCA) framework by leveraging a genomic-interaction-encoded image representation of RNA-seq data for substantially improved pattern discovery. The framework incorporates sample-specific gene-expression variability and derives signature patterns by utilizing a convolutional variational autoencoder and Gaussian mixture model. An image-domain linear decomposition of bulk RNA-seq data based on these sample-specific, interpretable gene-signature patterns is then performed for CCA and other downstream tasks, such as cancer subtype classification and biomarker discovery. We demonstrate that the proposed technique improves decomposition accuracy by over 14.1% in average Pearson correlation compared to existing techniques by using both simulation and experimental datasets. This approach offers an effective solution for tissue heterogeneity analysis and lays a foundation for a range of clinical and biological applications.
{"title":"Unveiling tissue heterogeneity through genomic interaction-encoded image representation of RNA-sequencing data.","authors":"Junyan Liu,Zixia Zhou,Yizheng Chen,Md Tauhidul Islam,Lei Xing","doi":"10.1016/j.ajhg.2025.08.021","DOIUrl":"https://doi.org/10.1016/j.ajhg.2025.08.021","url":null,"abstract":"Genomic sequencing is essential for both biomedical research and clinical practice. While single-cell RNA sequencing (scRNA-seq) provides insights into biological processes at the cellular level, bulk RNA sequencing remains widely used for its scalability and cost-effectiveness. To explore biological heterogeneity, research efforts have been made toward inferring single-cell-like cellular compositions from bulk samples, i.e., deconvolving bulk samples into multiple cell types. However, existing deconvolution methods face two major limitations: (1) reliance on predefined gene signature matrices without accounting for inter-sample variability and (2) susceptibility to noise within biological systems. Here, we propose a cellular-component analysis (CCA) framework by leveraging a genomic-interaction-encoded image representation of RNA-seq data for substantially improved pattern discovery. The framework incorporates sample-specific gene-expression variability and derives signature patterns by utilizing a convolutional variational autoencoder and Gaussian mixture model. An image-domain linear decomposition of bulk RNA-seq data based on these sample-specific, interpretable gene-signature patterns is then performed for CCA and other downstream tasks, such as cancer subtype classification and biomarker discovery. We demonstrate that the proposed technique improves decomposition accuracy by over 14.1% in average Pearson correlation compared to existing techniques by using both simulation and experimental datasets. This approach offers an effective solution for tissue heterogeneity analysis and lays a foundation for a range of clinical and biological applications.","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":"17 1","pages":""},"PeriodicalIF":9.8,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145083354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-09DOI: 10.1016/j.ajhg.2025.08.015
Hammad Yousaf,Maayke A de Koning,Kamal Khan,Kelly L Gilmore,Mariëtte J V Hoffer,Georgios Kellaris,Sophie Lanone,Maylis Dagouassat,Farid Ullah,Phebe N Adama van Scheltema,Delphine Heron,Yline Capri,Alma Kuechler,Bernd Schweiger,Monique C Haak,Boris Keren,Frederic Tran Mau Them,Cacha M P C D Peeters-Scholte,Frank J Kaiser,Tamara T Koopmann,Hailiang Mei,Binnaz Yalcin,Christel Depienne,Neeta L Vora,Gijs W E Santen,Erica E Davis
Fetal brain anomalies identified by prenatal ultrasound and/or magnetic resonance imaging represent a considerable healthcare burden with ∼1-2/1,000 live births. To identify the underlying etiology, trio prenatal exome sequencing or genome sequencing (ES/GS) has emerged as a comprehensive diagnostic paradigm with a reported diagnostic rate up to ∼32%. Here, we report five unrelated families with six affected individuals that presented neuroanatomical, craniofacial, and skeletal anomalies, all harboring rare, bi-allelic deleterious variants in SNAPIN, which encodes SNARE-associated protein. SNAPIN is a ubiquitously expressed component of the autophagy-lysosomal pathway that catalyzes retrograde axonal transport and synaptic transmission. To investigate the role of SNAPIN in brain development, we generated zebrafish gene ablation models, which recapitulated human-relevant disease phenotypes. Two independent, genetically stable snapin mutants exhibited pre-adulthood lethality, reduced overall length, disproportionately smaller head size, and altered brain morphology. Transcriptomic profiling of snapin mutant zebrafish heads revealed an early and progressive transcriptomic shift marked by autophagy activation with concomitant downregulation of structural and neurodevelopmental genes. Assessment of brain cellular ultrastructure with electron microscopy and light chain 3 (LC3)-II immunoblotting revealed retrograde vesicle transport defects, with an accumulation of late endosomes and autophagosomes. Together, these findings support bi-allelic pathogenic variants in SNAPIN as a likely cause for a severe neurodevelopmental syndrome and expand the growing list of autophagy-lysosome pathway regulators essential for human brain development.
{"title":"Bi-allelic deleterious variants in SNAPIN, which encodes a retrograde dynein adaptor, cause a prenatal-onset neurodevelopmental disorder.","authors":"Hammad Yousaf,Maayke A de Koning,Kamal Khan,Kelly L Gilmore,Mariëtte J V Hoffer,Georgios Kellaris,Sophie Lanone,Maylis Dagouassat,Farid Ullah,Phebe N Adama van Scheltema,Delphine Heron,Yline Capri,Alma Kuechler,Bernd Schweiger,Monique C Haak,Boris Keren,Frederic Tran Mau Them,Cacha M P C D Peeters-Scholte,Frank J Kaiser,Tamara T Koopmann,Hailiang Mei,Binnaz Yalcin,Christel Depienne,Neeta L Vora,Gijs W E Santen,Erica E Davis","doi":"10.1016/j.ajhg.2025.08.015","DOIUrl":"https://doi.org/10.1016/j.ajhg.2025.08.015","url":null,"abstract":"Fetal brain anomalies identified by prenatal ultrasound and/or magnetic resonance imaging represent a considerable healthcare burden with ∼1-2/1,000 live births. To identify the underlying etiology, trio prenatal exome sequencing or genome sequencing (ES/GS) has emerged as a comprehensive diagnostic paradigm with a reported diagnostic rate up to ∼32%. Here, we report five unrelated families with six affected individuals that presented neuroanatomical, craniofacial, and skeletal anomalies, all harboring rare, bi-allelic deleterious variants in SNAPIN, which encodes SNARE-associated protein. SNAPIN is a ubiquitously expressed component of the autophagy-lysosomal pathway that catalyzes retrograde axonal transport and synaptic transmission. To investigate the role of SNAPIN in brain development, we generated zebrafish gene ablation models, which recapitulated human-relevant disease phenotypes. Two independent, genetically stable snapin mutants exhibited pre-adulthood lethality, reduced overall length, disproportionately smaller head size, and altered brain morphology. Transcriptomic profiling of snapin mutant zebrafish heads revealed an early and progressive transcriptomic shift marked by autophagy activation with concomitant downregulation of structural and neurodevelopmental genes. Assessment of brain cellular ultrastructure with electron microscopy and light chain 3 (LC3)-II immunoblotting revealed retrograde vesicle transport defects, with an accumulation of late endosomes and autophagosomes. Together, these findings support bi-allelic pathogenic variants in SNAPIN as a likely cause for a severe neurodevelopmental syndrome and expand the growing list of autophagy-lysosome pathway regulators essential for human brain development.","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":"12 1","pages":""},"PeriodicalIF":9.8,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145031927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-08DOI: 10.1016/j.ajhg.2025.08.014
Charlie F Rowlands,Sophie Allen,Alice Garrett,Miranda Durkie,George J Burghel,Rachel Robinson,Alison Callaway,Joanne Field,Bethan Frugtniet,Sheila Palmer-Smith,Jonathan Grant,Judith Pagan,Trudi McDevitt,Katie Snape,Helen Hanson,Terri McVeigh,Clare Turnbull,
Multiplex assays of variant effect (MAVEs) provide promising new sources of functional evidence, potentially empowering improved classification of germline genomic variants, particularly rare missense variants, which are commonly assigned as variants of uncertain significance (VUSs). However, paradoxically, quantification of clinically applicable evidence strengths for MAVEs requires construction of "truthsets" comprising missense variants already robustly classified as pathogenic and benign. In this study, we demonstrate how benign truthset size is the primary driver of applicable functional evidence toward pathogenicity (PS3). We demonstrate, when using existing ClinVar classifications as a source of benign missense truthset variants, that only 19.8% (23/116) of established cancer susceptibility genes had a PS3 evidence strength of "strong" attainable when simulating validation for a hypothetical new MAVE (also applying favorable assumption of perfect concordance). We describe a systematic framework for benign truthset construction in which all possible missense variants in a gene of interest are concurrently assessed for assignation of (likely) benignity via established ACMG/AMP combination rules, including population frequency, in silico evidence codes, and case-control signal. We apply this framework to eight hereditary breast and ovarian cancer genes, demonstrating that systematically generated benign missense truthsets allow maximum application of PS3 at greater (or equivalent) strength-reaching "moderate" for CHEK2 and "strong" for the other seven genes-than those derived from ClinVar ≥2∗ classifications alone. We propose, given many genes have few existing benign-classified missense variants, that the application of this systematic framework to disease genes more broadly will be important for leveraging full value from MAVEs.
{"title":"Availability of benign missense variant \"truthsets\" for validation of functional assays: Current status and a systematic approach.","authors":"Charlie F Rowlands,Sophie Allen,Alice Garrett,Miranda Durkie,George J Burghel,Rachel Robinson,Alison Callaway,Joanne Field,Bethan Frugtniet,Sheila Palmer-Smith,Jonathan Grant,Judith Pagan,Trudi McDevitt,Katie Snape,Helen Hanson,Terri McVeigh,Clare Turnbull, ","doi":"10.1016/j.ajhg.2025.08.014","DOIUrl":"https://doi.org/10.1016/j.ajhg.2025.08.014","url":null,"abstract":"Multiplex assays of variant effect (MAVEs) provide promising new sources of functional evidence, potentially empowering improved classification of germline genomic variants, particularly rare missense variants, which are commonly assigned as variants of uncertain significance (VUSs). However, paradoxically, quantification of clinically applicable evidence strengths for MAVEs requires construction of \"truthsets\" comprising missense variants already robustly classified as pathogenic and benign. In this study, we demonstrate how benign truthset size is the primary driver of applicable functional evidence toward pathogenicity (PS3). We demonstrate, when using existing ClinVar classifications as a source of benign missense truthset variants, that only 19.8% (23/116) of established cancer susceptibility genes had a PS3 evidence strength of \"strong\" attainable when simulating validation for a hypothetical new MAVE (also applying favorable assumption of perfect concordance). We describe a systematic framework for benign truthset construction in which all possible missense variants in a gene of interest are concurrently assessed for assignation of (likely) benignity via established ACMG/AMP combination rules, including population frequency, in silico evidence codes, and case-control signal. We apply this framework to eight hereditary breast and ovarian cancer genes, demonstrating that systematically generated benign missense truthsets allow maximum application of PS3 at greater (or equivalent) strength-reaching \"moderate\" for CHEK2 and \"strong\" for the other seven genes-than those derived from ClinVar ≥2∗ classifications alone. We propose, given many genes have few existing benign-classified missense variants, that the application of this systematic framework to disease genes more broadly will be important for leveraging full value from MAVEs.","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":"72 1","pages":""},"PeriodicalIF":9.8,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145025701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-04Epub Date: 2025-08-11DOI: 10.1016/j.ajhg.2025.08.001
Pomme M F Rigter, Charlotte de Konink, Matthew J Dunn, Martina Proietti Onori, Jennifer B Humberson, Matthew Thomas, Caitlin Barnes, Carlos E Prada, K Nicole Weaver, Thomas D Ryan, Oana Caluseriu, Jennifer Conway, Emily Calamaro, Chin-To Fong, Wim Wuyts, Marije Meuwissen, Eva Hordijk, Carsten N Jonkers, Lucas Anderson, Berfin Yuseinova, Sarah Polonia, Diane Beysen, Zornitza Stark, Elena Savva, Cathryn Poulton, Fiona McKenzie, Elizabeth Bhoj, Caleb P Bupp, Stéphane Bézieau, Sandra Mercier, Amy Blevins, Ingrid M Wentzensen, Fan Xia, Jill A Rosenfeld, Tzung-Chien Hsieh, Peter M Krawitz, Miriam Elbracht, Danielle C M Veenma, Howard Schulman, Margaret M Stratton, Sébastien Küry, Geeske M van Woerden
{"title":"Role of CAMK2D in neurodevelopment and associated conditions.","authors":"Pomme M F Rigter, Charlotte de Konink, Matthew J Dunn, Martina Proietti Onori, Jennifer B Humberson, Matthew Thomas, Caitlin Barnes, Carlos E Prada, K Nicole Weaver, Thomas D Ryan, Oana Caluseriu, Jennifer Conway, Emily Calamaro, Chin-To Fong, Wim Wuyts, Marije Meuwissen, Eva Hordijk, Carsten N Jonkers, Lucas Anderson, Berfin Yuseinova, Sarah Polonia, Diane Beysen, Zornitza Stark, Elena Savva, Cathryn Poulton, Fiona McKenzie, Elizabeth Bhoj, Caleb P Bupp, Stéphane Bézieau, Sandra Mercier, Amy Blevins, Ingrid M Wentzensen, Fan Xia, Jill A Rosenfeld, Tzung-Chien Hsieh, Peter M Krawitz, Miriam Elbracht, Danielle C M Veenma, Howard Schulman, Margaret M Stratton, Sébastien Küry, Geeske M van Woerden","doi":"10.1016/j.ajhg.2025.08.001","DOIUrl":"10.1016/j.ajhg.2025.08.001","url":null,"abstract":"","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":" ","pages":"2247"},"PeriodicalIF":8.1,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12460999/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144833735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-04Epub Date: 2025-07-29DOI: 10.1016/j.ajhg.2025.07.005
Shelby L Hemker, Ashley Marsh, Felicia Hernandez, Elena Glick, Grace Clark, Alyssa Bashir, Krystal Jiang, Jacob O Kitzman
Variants of uncertain significance (VUSs) limit the actionability of genetic testing. A prominent example is MUTYH, a DNA repair factor underlying colorectal cancer with a pathogenic variant carrier rate of ∼1:50. To systematically interrogate MUTYH variant function, we coupled deep mutational scanning to DNA repair reporters containing its lesion substrate, 8OG:A. Our variant-to-function map covers 96.6% of possible MUTYH point variants (n = 10,941) and achieves 100% accuracy on known clinical variants (n = 247). Leveraging a large clinical registry, we observe significant associations with colorectal polyps and cancer, with more severely impaired missense variants conferring greater risk. We recapitulate functional differences between pathogenic founder alleles and highlight sites of complete missense intolerance, including residues that intercalate DNA and coordinate essential Zn2+ or Fe-S clusters. This map provides a resource to resolve the >1,100 existing missense VUSs in MUTYH and demonstrates a scalable strategy to interrogate other clinically relevant DNA repair factors.
{"title":"Saturation mapping of MUTYH variant effects using DNA repair reporters.","authors":"Shelby L Hemker, Ashley Marsh, Felicia Hernandez, Elena Glick, Grace Clark, Alyssa Bashir, Krystal Jiang, Jacob O Kitzman","doi":"10.1016/j.ajhg.2025.07.005","DOIUrl":"10.1016/j.ajhg.2025.07.005","url":null,"abstract":"<p><p>Variants of uncertain significance (VUSs) limit the actionability of genetic testing. A prominent example is MUTYH, a DNA repair factor underlying colorectal cancer with a pathogenic variant carrier rate of ∼1:50. To systematically interrogate MUTYH variant function, we coupled deep mutational scanning to DNA repair reporters containing its lesion substrate, 8OG:A. Our variant-to-function map covers 96.6% of possible MUTYH point variants (n = 10,941) and achieves 100% accuracy on known clinical variants (n = 247). Leveraging a large clinical registry, we observe significant associations with colorectal polyps and cancer, with more severely impaired missense variants conferring greater risk. We recapitulate functional differences between pathogenic founder alleles and highlight sites of complete missense intolerance, including residues that intercalate DNA and coordinate essential Zn<sup>2+</sup> or Fe-S clusters. This map provides a resource to resolve the >1,100 existing missense VUSs in MUTYH and demonstrates a scalable strategy to interrogate other clinically relevant DNA repair factors.</p>","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":" ","pages":"2010-2026"},"PeriodicalIF":8.1,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12461019/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144752062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-04Epub Date: 2025-07-29DOI: 10.1016/j.ajhg.2025.07.004
Julian Stamp, Samuel Pattillo Smith, Daniel Weinreich, Lorin Crawford
The lack of computational methods capable of detecting epistasis in biobanks has led to uncertainty about the role of non-additive genetic effects on complex trait variation. The marginal epistasis framework is a powerful approach because it estimates the likelihood of a SNP being involved in any interaction, thereby reducing the multiple testing burden. Current implementations of this approach have failed to scale genome wide in large human studies. To address this, we present the sparse marginal epistasis (SME) test, which concentrates the scans for epistasis to regions of the genome that have known functional enrichment for a quantitative trait of interest. By leveraging the sparse nature of this modeling setup, we develop a statistical algorithm that allows SME to run 10-90 times faster than state-of-the-art epistatic mapping methods. In a study of complex traits measured in 349,411 individuals from the UK Biobank, we show that reducing searches of epistasis to variants in functionally enriched regions facilitates the identification of genetic interactions associated with regulatory genomic elements.
{"title":"Sparse modeling of interactions enables fast detection of genome-wide epistasis in biobank-scale studies.","authors":"Julian Stamp, Samuel Pattillo Smith, Daniel Weinreich, Lorin Crawford","doi":"10.1016/j.ajhg.2025.07.004","DOIUrl":"10.1016/j.ajhg.2025.07.004","url":null,"abstract":"<p><p>The lack of computational methods capable of detecting epistasis in biobanks has led to uncertainty about the role of non-additive genetic effects on complex trait variation. The marginal epistasis framework is a powerful approach because it estimates the likelihood of a SNP being involved in any interaction, thereby reducing the multiple testing burden. Current implementations of this approach have failed to scale genome wide in large human studies. To address this, we present the sparse marginal epistasis (SME) test, which concentrates the scans for epistasis to regions of the genome that have known functional enrichment for a quantitative trait of interest. By leveraging the sparse nature of this modeling setup, we develop a statistical algorithm that allows SME to run 10-90 times faster than state-of-the-art epistatic mapping methods. In a study of complex traits measured in 349,411 individuals from the UK Biobank, we show that reducing searches of epistasis to variants in functionally enriched regions facilitates the identification of genetic interactions associated with regulatory genomic elements.</p>","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":" ","pages":"2198-2212"},"PeriodicalIF":8.1,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12461027/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144752063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-04Epub Date: 2025-07-22DOI: 10.1016/j.ajhg.2025.06.018
Jasmine Baker, Erik Stricker, Julie Coleman, Shamika Ketkar, Taotao Tan, Ashley M Butler, LaTerrica Williams, Latanya Hammonds-Odie, Debra Murray, Brendan Lee, Kim C Worley, Elizabeth G Atkinson
A lack of representation in genomic research and limited access to computational training create barriers for many researchers seeking to analyze large-scale genetic datasets. The All of Us Research Program provides an unprecedented opportunity to address these gaps by offering genomic data from a broad range of participants, but its impact depends on equipping researchers with the necessary skills to use it effectively. The All of Us Biomedical Researcher (BR) Scholars Program at Baylor College of Medicine aims to break down these barriers by providing early-career researchers with hands-on training in computational genomics through the All of Us Evenings with Genetics Research Program. The year-long program begins with the faculty summit, an in-person computational boot camp that introduces scholars to foundational skills for using the All of Us dataset via a cloud-based research environment. The genomics tutorials focus on genome-wide association studies (GWASs), utilizing Jupyter Notebooks and the Hail computing framework to provide an accessible and scalable approach to large-scale data analysis. Scholars engage in hands-on exercises covering data preparation, quality control, association testing, and result interpretation. By the end of the summit, participants will have successfully conducted a GWAS, visualized key findings, and gained confidence in computational resource management. This initiative expands access to genomic research by equipping early-career researchers from a variety of backgrounds with the tools and knowledge to analyze All of Us data. By lowering barriers to entry and promoting the study of representative populations, the program fosters innovation in precision medicine and advances equity in genomic research.
{"title":"Implementing a training resource for large-scale genomic data analysis in the All of Us Researcher Workbench.","authors":"Jasmine Baker, Erik Stricker, Julie Coleman, Shamika Ketkar, Taotao Tan, Ashley M Butler, LaTerrica Williams, Latanya Hammonds-Odie, Debra Murray, Brendan Lee, Kim C Worley, Elizabeth G Atkinson","doi":"10.1016/j.ajhg.2025.06.018","DOIUrl":"10.1016/j.ajhg.2025.06.018","url":null,"abstract":"<p><p>A lack of representation in genomic research and limited access to computational training create barriers for many researchers seeking to analyze large-scale genetic datasets. The All of Us Research Program provides an unprecedented opportunity to address these gaps by offering genomic data from a broad range of participants, but its impact depends on equipping researchers with the necessary skills to use it effectively. The All of Us Biomedical Researcher (BR) Scholars Program at Baylor College of Medicine aims to break down these barriers by providing early-career researchers with hands-on training in computational genomics through the All of Us Evenings with Genetics Research Program. The year-long program begins with the faculty summit, an in-person computational boot camp that introduces scholars to foundational skills for using the All of Us dataset via a cloud-based research environment. The genomics tutorials focus on genome-wide association studies (GWASs), utilizing Jupyter Notebooks and the Hail computing framework to provide an accessible and scalable approach to large-scale data analysis. Scholars engage in hands-on exercises covering data preparation, quality control, association testing, and result interpretation. By the end of the summit, participants will have successfully conducted a GWAS, visualized key findings, and gained confidence in computational resource management. This initiative expands access to genomic research by equipping early-career researchers from a variety of backgrounds with the tools and knowledge to analyze All of Us data. By lowering barriers to entry and promoting the study of representative populations, the program fosters innovation in precision medicine and advances equity in genomic research.</p>","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":" ","pages":"2001-2009"},"PeriodicalIF":8.1,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12320718/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144697425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-04DOI: 10.1016/j.ajhg.2025.07.015
Yu-Jyun Huang, Nuzulul Kurniansyah, Daniel F Levey, Joel Gelernter, Jennifer E Huffman, Kelly Cho, Peter W F Wilson, Daniel J Gottlieb, Kenneth M Rice, Tamar Sofer
Strong sex differences exist in sleep phenotypes and also cardiovascular diseases (CVDs). However, sex-specific causal effects of sleep phenotypes on CVD-related outcomes have not been thoroughly examined. Mendelian randomization (MR) analysis is a useful approach for estimating the causal effect of a risk factor on an outcome of interest when interventional studies are not available. We first conducted sex-specific genome-wide association studies (GWASs) for suboptimal-sleep phenotypes (insomnia, obstructive sleep apnea [OSA], short and long sleep durations, and excessive daytime sleepiness) utilizing the Million Veteran Program (MVP) dataset. We then developed a semi-empirical Bayesian framework that (1) calibrates variant-phenotype effect estimates by leveraging information across sex groups and (2) applies shrinkage sex-specific effect estimates in MR analysis to alleviate weak instrumental bias when sex groups are analyzed in isolation. Simulation studies demonstrate that the causal effect estimates derived from our framework are substantially more efficient than those obtained through conventional methods. We estimated the causal effects of sleep phenotypes on CVD-related outcomes using sex-specific GWAS data from the MVP and All of Us. Significant sex differences in causal effects were observed, particularly between OSA and chronic kidney disease, as well as long sleep duration on several CVD-related outcomes. By applying shrinkage estimates for instrumental variable selection, we identified multiple sex-specific significant causal relationships between OSA and CVD-related phenotypes. The method is generalizable and can be used to improve power and alleviate weak instrument bias when only a small sample is available for a specific condition or group.
睡眠表型和心血管疾病(cvd)存在明显的性别差异。然而,睡眠表型对cvd相关结果的性别特异性因果影响尚未得到彻底研究。孟德尔随机化(MR)分析是一种有用的方法,可以在没有介入研究的情况下估计风险因素对结果的因果关系。我们首先利用百万退伍军人计划(MVP)数据集对次优睡眠表型(失眠、阻塞性睡眠呼吸暂停[OSA]、短睡眠时间和长睡眠时间以及白天过度嗜睡)进行了性别特异性全基因组关联研究(GWASs)。然后,我们开发了一个半经验贝叶斯框架,该框架(1)通过利用跨性别群体的信息来校准变异表型效应估计;(2)在MR分析中应用收缩性别特异性效应估计,以减轻性别群体孤立分析时的弱工具偏差。模拟研究表明,从我们的框架中得出的因果效应估计比通过传统方法获得的因果效应估计要有效得多。我们使用来自MVP和All of Us的性别特异性GWAS数据估计了睡眠表型对cvd相关结果的因果影响。在因果效应中观察到显著的性别差异,特别是在OSA和慢性肾脏疾病之间,以及长时间睡眠对几种cvd相关结果的影响。通过应用工具变量选择的收缩估计,我们确定了OSA和cvd相关表型之间的多重性别特异性显著因果关系。该方法具有通用性,可用于在特定条件或群体中只有小样本可用时提高功率和减轻弱仪器偏差。
{"title":"A semi-empirical Bayes approach for calibrating weak instrumental bias in sex-specific Mendelian randomization studies.","authors":"Yu-Jyun Huang, Nuzulul Kurniansyah, Daniel F Levey, Joel Gelernter, Jennifer E Huffman, Kelly Cho, Peter W F Wilson, Daniel J Gottlieb, Kenneth M Rice, Tamar Sofer","doi":"10.1016/j.ajhg.2025.07.015","DOIUrl":"10.1016/j.ajhg.2025.07.015","url":null,"abstract":"<p><p>Strong sex differences exist in sleep phenotypes and also cardiovascular diseases (CVDs). However, sex-specific causal effects of sleep phenotypes on CVD-related outcomes have not been thoroughly examined. Mendelian randomization (MR) analysis is a useful approach for estimating the causal effect of a risk factor on an outcome of interest when interventional studies are not available. We first conducted sex-specific genome-wide association studies (GWASs) for suboptimal-sleep phenotypes (insomnia, obstructive sleep apnea [OSA], short and long sleep durations, and excessive daytime sleepiness) utilizing the Million Veteran Program (MVP) dataset. We then developed a semi-empirical Bayesian framework that (1) calibrates variant-phenotype effect estimates by leveraging information across sex groups and (2) applies shrinkage sex-specific effect estimates in MR analysis to alleviate weak instrumental bias when sex groups are analyzed in isolation. Simulation studies demonstrate that the causal effect estimates derived from our framework are substantially more efficient than those obtained through conventional methods. We estimated the causal effects of sleep phenotypes on CVD-related outcomes using sex-specific GWAS data from the MVP and All of Us. Significant sex differences in causal effects were observed, particularly between OSA and chronic kidney disease, as well as long sleep duration on several CVD-related outcomes. By applying shrinkage estimates for instrumental variable selection, we identified multiple sex-specific significant causal relationships between OSA and CVD-related phenotypes. The method is generalizable and can be used to improve power and alleviate weak instrument bias when only a small sample is available for a specific condition or group.</p>","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":"112 9","pages":"2213-2231"},"PeriodicalIF":8.1,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416758/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145005776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-04DOI: 10.1016/j.ajhg.2025.08.003
Tara Dutka, Erika J Faust, C Scott Gallagher, Travis Hyams, Elyse Kozlowski, Erica Landis, Minnkyong Lee, Grace F Liou, Tamara R Litwin, Christopher Lunt, Sana H Mian, Anjene Musick, Nguyen Park, Theresa Patten, Janeth Sanchez, Sheri D Schully, Cathy Shyr, Geoffrey S Ginsburg
{"title":"All of Us Research Program year in review: 2024.","authors":"Tara Dutka, Erika J Faust, C Scott Gallagher, Travis Hyams, Elyse Kozlowski, Erica Landis, Minnkyong Lee, Grace F Liou, Tamara R Litwin, Christopher Lunt, Sana H Mian, Anjene Musick, Nguyen Park, Theresa Patten, Janeth Sanchez, Sheri D Schully, Cathy Shyr, Geoffrey S Ginsburg","doi":"10.1016/j.ajhg.2025.08.003","DOIUrl":"10.1016/j.ajhg.2025.08.003","url":null,"abstract":"","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":"112 9","pages":"1983-1987"},"PeriodicalIF":8.1,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12461012/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145005783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-04Epub Date: 2025-08-18DOI: 10.1016/j.ajhg.2025.07.009
Michael J Betti, James Jaworski, Shilin Zhao, J Sunil Rao, Bríd M Ryan, Ann G Schwartz, Christine M Lusk, Lucie McCoy, John K Wiencke, Marino A Bruce, Stephen Chanock, Eric R Gamazon, Jacklyn N Hellwege, Melinda C Aldrich
Striking disparities in lung cancer exist, with Black/African American individuals disproportionately affected by lung cancer, yet the genetic architecture in African ancestry individuals is poorly understood. We aimed to address this by performing a comprehensive genetic association study of lung cancer, incorporating local ancestry, across 6,490 African ancestry individuals (2,390 individuals with lung cancer and 4,100 control subjects). We identified a single genome-wide significant (p < 5 × 10-8) locus, 15q25.1 (lead SNP rs17486278, OR [95% CI] = 1.34 [1.23-1.45], p = 4.52 × 10-12), that has consistently shown a strong association with lung cancer across populations. Additionally, we identified nine suggestive (p < 1 × 10-6) loci. Four of these loci (3p12.1, 8q22.2, 14q11.2, and 18q22.3) have no prior reported associations with lung cancer. We performed a multi-ancestry lung cancer meta-analysis using prior large-scale summary statistics from European and Asian ancestry populations, incorporating our African ancestry results. The meta-analysis identified 17 genome-wide significant loci, including an association with locus 4q35.2 (p = 1.22 × 10-8), a genomic region that has been previously linked to forced expiratory volume. Genome-wide SNP-based heritability for lung cancer was 16% among African ancestry individuals. Follow-up in silico functional analyses identified genetically regulated gene expression (GReX) of nine genes (AC012184.3, ADK, CCDC12, CHRNA3, EML4, PSMA4, SNRNP200, TMEM50A, and ZYG11A) associated with lung cancer risk and biological pathways relevant to cancer and lung function. Cumulatively, these findings further elucidate the genetic architecture of lung cancer in African ancestry individuals, confirming prior loci and revealing new loci.
{"title":"Genetic analysis in African ancestry populations reveals genetic contributors to lung cancer susceptibility.","authors":"Michael J Betti, James Jaworski, Shilin Zhao, J Sunil Rao, Bríd M Ryan, Ann G Schwartz, Christine M Lusk, Lucie McCoy, John K Wiencke, Marino A Bruce, Stephen Chanock, Eric R Gamazon, Jacklyn N Hellwege, Melinda C Aldrich","doi":"10.1016/j.ajhg.2025.07.009","DOIUrl":"10.1016/j.ajhg.2025.07.009","url":null,"abstract":"<p><p>Striking disparities in lung cancer exist, with Black/African American individuals disproportionately affected by lung cancer, yet the genetic architecture in African ancestry individuals is poorly understood. We aimed to address this by performing a comprehensive genetic association study of lung cancer, incorporating local ancestry, across 6,490 African ancestry individuals (2,390 individuals with lung cancer and 4,100 control subjects). We identified a single genome-wide significant (p < 5 × 10<sup>-8</sup>) locus, 15q25.1 (lead SNP rs17486278, OR [95% CI] = 1.34 [1.23-1.45], p = 4.52 × 10<sup>-12</sup>), that has consistently shown a strong association with lung cancer across populations. Additionally, we identified nine suggestive (p < 1 × 10<sup>-6</sup>) loci. Four of these loci (3p12.1, 8q22.2, 14q11.2, and 18q22.3) have no prior reported associations with lung cancer. We performed a multi-ancestry lung cancer meta-analysis using prior large-scale summary statistics from European and Asian ancestry populations, incorporating our African ancestry results. The meta-analysis identified 17 genome-wide significant loci, including an association with locus 4q35.2 (p = 1.22 × 10<sup>-8</sup>), a genomic region that has been previously linked to forced expiratory volume. Genome-wide SNP-based heritability for lung cancer was 16% among African ancestry individuals. Follow-up in silico functional analyses identified genetically regulated gene expression (GReX) of nine genes (AC012184.3, ADK, CCDC12, CHRNA3, EML4, PSMA4, SNRNP200, TMEM50A, and ZYG11A) associated with lung cancer risk and biological pathways relevant to cancer and lung function. Cumulatively, these findings further elucidate the genetic architecture of lung cancer in African ancestry individuals, confirming prior loci and revealing new loci.</p>","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":" ","pages":"2102-2114"},"PeriodicalIF":8.1,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12461003/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144881876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}