Pub Date : 2025-10-09Epub Date: 2025-08-08DOI: 10.1016/j.xhgg.2025.100490
Xiaoqi Li, Elena Kharitonova, Minxing Pang, Jia Wen, Laura Y Zhou, Laura Raffield, Haibo Zhou, Huaxiu Yao, Can Chen, Yun Li, Quan Sun
Genetic prediction of complex traits, enabled by large-scale genomic studies, has created new measures to understand individual genetic predisposition. Polygenic risk scores (PRSs) offer a way to aggregate information across the genome, enabling personalized risk prediction for complex traits and diseases. However, conventional PRS calculation methods that rely on linear models are limited in their ability to capture complex patterns and interaction effects in high-dimensional genomic data. In this study, we seek to improve the predictive power of PRS through applying advanced deep learning techniques. We show that the variational autoencoder-based model for PRS construction (VAE-PRS) outperforms currently state-of-the-art methods for biobank-level data in 14 out of 16 blood cell traits, while being computationally efficient. Through comprehensive experiments, we found that the VAE-PRS model offers the ability to capture interaction effects in high-dimensional data and shows robust performance across different pre-screened variant sets. Furthermore, VAE-PRS is easily interpretable via assessing the contribution of each individual marker to the final prediction score through the Shapley additive explanations method, providing potential new insights in identifying trait-associated genetic variants. In summary, VAE-PRS presents a measure to genetic risk prediction for blood cell traits by harnessing the power of deep learning methods given appropriate training sample size, which could further facilitate the development of personalized medicine and genetic research.
{"title":"Variational autoencoder-based model improves polygenic prediction in blood cell traits.","authors":"Xiaoqi Li, Elena Kharitonova, Minxing Pang, Jia Wen, Laura Y Zhou, Laura Raffield, Haibo Zhou, Huaxiu Yao, Can Chen, Yun Li, Quan Sun","doi":"10.1016/j.xhgg.2025.100490","DOIUrl":"10.1016/j.xhgg.2025.100490","url":null,"abstract":"<p><p>Genetic prediction of complex traits, enabled by large-scale genomic studies, has created new measures to understand individual genetic predisposition. Polygenic risk scores (PRSs) offer a way to aggregate information across the genome, enabling personalized risk prediction for complex traits and diseases. However, conventional PRS calculation methods that rely on linear models are limited in their ability to capture complex patterns and interaction effects in high-dimensional genomic data. In this study, we seek to improve the predictive power of PRS through applying advanced deep learning techniques. We show that the variational autoencoder-based model for PRS construction (VAE-PRS) outperforms currently state-of-the-art methods for biobank-level data in 14 out of 16 blood cell traits, while being computationally efficient. Through comprehensive experiments, we found that the VAE-PRS model offers the ability to capture interaction effects in high-dimensional data and shows robust performance across different pre-screened variant sets. Furthermore, VAE-PRS is easily interpretable via assessing the contribution of each individual marker to the final prediction score through the Shapley additive explanations method, providing potential new insights in identifying trait-associated genetic variants. In summary, VAE-PRS presents a measure to genetic risk prediction for blood cell traits by harnessing the power of deep learning methods given appropriate training sample size, which could further facilitate the development of personalized medicine and genetic research.</p>","PeriodicalId":34530,"journal":{"name":"HGG Advances","volume":" ","pages":"100490"},"PeriodicalIF":3.6,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12398231/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144812538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-09Epub Date: 2025-07-18DOI: 10.1016/j.xhgg.2025.100482
Oliver Pain
Genome-wide association studies (GWASs) from multiple ancestral populations are increasingly available, offering opportunities to improve the accuracy and equity of polygenic scores (PGSs). Several methods now aim to leverage multiple GWAS sources, but predictive performance and computational efficiency remain unclear, particularly when individual-level tuning data are unavailable. This study evaluates a comprehensive set of PGS methods across African (AFR), East Asian (EAS), and European (EUR) ancestries for 10 complex traits, using summary statistics from the Ugandan Genome Resource, Biobank Japan, UK Biobank, and the Million Veteran Program. Single-source PGSs were derived using methods including DBSLMM, lassosum, LDpred2, MegaPRS, pT + clump, PRS-CS, QuickPRS, and SBayesRC. Multi-source approaches included PRS-CSx, TL-PRS, X-Wing, and combinations of independently optimized single-source scores. All methods were restricted to HapMap3 variants and used linkage disequilibrium reference panels matching the GWAS super population. A key contribution is a novel application of the LEOPARD method to estimate optimal linear combinations of population-specific PGSs using only summary statistics. Analyses were implemented using the open-source GenoPred pipeline. In AFR and EAS populations, PGS combining ancestry-aligned and European GWASs outperformed single-source models. Linear combinations of independently optimized scores consistently outperformed current jointly optimized multi-source methods, while being substantially more computationally efficient. The LEOPARD extension offered a practical solution for tuning these combinations when only summary statistics were available, achieving performance comparable to tuning with individual-level data. These findings highlight a flexible and generalizable framework for multi-source PGS construction. The GenoPred pipeline supports more equitable, accurate, and accessible polygenic prediction.
{"title":"Leveraging global genetics resources to enhance polygenic prediction across ancestrally diverse populations.","authors":"Oliver Pain","doi":"10.1016/j.xhgg.2025.100482","DOIUrl":"10.1016/j.xhgg.2025.100482","url":null,"abstract":"<p><p>Genome-wide association studies (GWASs) from multiple ancestral populations are increasingly available, offering opportunities to improve the accuracy and equity of polygenic scores (PGSs). Several methods now aim to leverage multiple GWAS sources, but predictive performance and computational efficiency remain unclear, particularly when individual-level tuning data are unavailable. This study evaluates a comprehensive set of PGS methods across African (AFR), East Asian (EAS), and European (EUR) ancestries for 10 complex traits, using summary statistics from the Ugandan Genome Resource, Biobank Japan, UK Biobank, and the Million Veteran Program. Single-source PGSs were derived using methods including DBSLMM, lassosum, LDpred2, MegaPRS, pT + clump, PRS-CS, QuickPRS, and SBayesRC. Multi-source approaches included PRS-CSx, TL-PRS, X-Wing, and combinations of independently optimized single-source scores. All methods were restricted to HapMap3 variants and used linkage disequilibrium reference panels matching the GWAS super population. A key contribution is a novel application of the LEOPARD method to estimate optimal linear combinations of population-specific PGSs using only summary statistics. Analyses were implemented using the open-source GenoPred pipeline. In AFR and EAS populations, PGS combining ancestry-aligned and European GWASs outperformed single-source models. Linear combinations of independently optimized scores consistently outperformed current jointly optimized multi-source methods, while being substantially more computationally efficient. The LEOPARD extension offered a practical solution for tuning these combinations when only summary statistics were available, achieving performance comparable to tuning with individual-level data. These findings highlight a flexible and generalizable framework for multi-source PGS construction. The GenoPred pipeline supports more equitable, accurate, and accessible polygenic prediction.</p>","PeriodicalId":34530,"journal":{"name":"HGG Advances","volume":" ","pages":"100482"},"PeriodicalIF":3.6,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12536657/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144668690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-09Epub Date: 2025-07-22DOI: 10.1016/j.xhgg.2025.100479
Cole M Williams, Jared O'Connell, Ethan Jewett, William A Freyman, Christopher R Gignoux, Sohini Ramachandran, Amy L Williams
Haplotype phasing, the process of determining which genetic variants are physically located on the same chromosome, is crucial for genetic analyses. Here, we benchmark SHAPEIT and Beagle, two state-of-the-art phasing methods, on two large datasets: >8 million research-consented 23andMe, Inc. customers and the UK Biobank (UKB). Remarkably, both methods' median switch error rate (SER) (after excluding single SNP switches, which we call "blips") is 0.00% across all tested 23andMe trio children and 0.026% in British samples from UKB. Across UKB samples, switch errors predominantly occur in regions lacking identity-by-descent (IBD) coverage. SHAPEIT and Beagle excel at intra-chromosomal phasing, but lack the ability to phase across chromosomes, motivating us to develop HAPTiC (HAPlotype Tiling and Clustering), an inter-chromosomal phasing method that assigns paternal and maternal variants genome-wide. Our approach uses IBD segments to phase blocks of variants on different chromosomes. HAPTiC represents the segments a focal individual shares with their relatives as nodes in a signed graph and performs spectral clustering. We test HAPTiC on 1,022 UKB trios, yielding a median per-site phase error of 0.13% in regions covered by IBD segments (45.1% of sites). We also ran HAPTiC in the 23andMe database and found a median phase error rate of 0.49% in Europeans (100% of sites) and 0.16% in admixed Africans (99.8% of sites). HAPTiC enables analyses that require the parent-of-origin of variants, such as association studies and ancestry inference of untyped parents.
单倍型相位,即确定哪些遗传变异物理上位于同一染色体上的过程,对遗传分析至关重要。在这里,我们对SHAPEIT和Beagle这两种最先进的分阶段方法进行了基准测试,基于两个大型数据集:8800万研究同意的23andMe公司客户和英国生物银行(UKB)。值得注意的是,在所有测试的23andMe三人组儿童中,这两种方法的中位开关错误率(SER)(排除单SNP开关后,我们称之为“小点”)为0.00%,而在来自英国的英国样本中为0.026%。在UKB样本中,开关错误主要发生在缺乏血统识别(IBD)覆盖的地区。SHAPEIT和Beagle擅长染色体内分期,但缺乏跨染色体分期的能力,这促使我们开发了HAPTiC (HAPlotype Tiling and Clustering),这是一种染色体间分期方法,可以在全基因组范围内分配父亲和母亲的变异。我们的方法使用IBD片段来相位不同染色体上的变异块。HAPTiC将焦点个体与其亲属共享的片段表示为符号图中的节点,并执行谱聚类。我们在1022个UKB三联体上测试了HAPTiC,在IBD片段覆盖的区域(45.1%的位点)中,每个位点的相位误差中位数为0.13%。我们还在23andMe数据库中运行了HAPTiC,发现欧洲人(100%的位点)的中位相位错误率为0.49%,混合非洲人(99.8%的位点)的中位相位错误率为0.16%。HAPTiC支持需要变体的父母起源的分析,例如关联研究和无型父母的祖先推断。
{"title":"Phasing millions of samples achieves near perfect accuracy, enabling parent-of-origin analyses.","authors":"Cole M Williams, Jared O'Connell, Ethan Jewett, William A Freyman, Christopher R Gignoux, Sohini Ramachandran, Amy L Williams","doi":"10.1016/j.xhgg.2025.100479","DOIUrl":"10.1016/j.xhgg.2025.100479","url":null,"abstract":"<p><p>Haplotype phasing, the process of determining which genetic variants are physically located on the same chromosome, is crucial for genetic analyses. Here, we benchmark SHAPEIT and Beagle, two state-of-the-art phasing methods, on two large datasets: >8 million research-consented 23andMe, Inc. customers and the UK Biobank (UKB). Remarkably, both methods' median switch error rate (SER) (after excluding single SNP switches, which we call \"blips\") is 0.00% across all tested 23andMe trio children and 0.026% in British samples from UKB. Across UKB samples, switch errors predominantly occur in regions lacking identity-by-descent (IBD) coverage. SHAPEIT and Beagle excel at intra-chromosomal phasing, but lack the ability to phase across chromosomes, motivating us to develop HAPTiC (HAPlotype Tiling and Clustering), an inter-chromosomal phasing method that assigns paternal and maternal variants genome-wide. Our approach uses IBD segments to phase blocks of variants on different chromosomes. HAPTiC represents the segments a focal individual shares with their relatives as nodes in a signed graph and performs spectral clustering. We test HAPTiC on 1,022 UKB trios, yielding a median per-site phase error of 0.13% in regions covered by IBD segments (45.1% of sites). We also ran HAPTiC in the 23andMe database and found a median phase error rate of 0.49% in Europeans (100% of sites) and 0.16% in admixed Africans (99.8% of sites). HAPTiC enables analyses that require the parent-of-origin of variants, such as association studies and ancestry inference of untyped parents.</p>","PeriodicalId":34530,"journal":{"name":"HGG Advances","volume":" ","pages":"100479"},"PeriodicalIF":3.6,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12536663/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144699768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-09Epub Date: 2025-07-18DOI: 10.1016/j.xhgg.2025.100480
Noah C Helderman, Ting Yang, Claire Palles, Diantha Terlouw, Hailiang Mei, Ruben H P Vorderman, Davy Cats, Marcos Díaz-Gay, Marjolijn C J Jongmans, Ashwin Ramdien, Irma van de Beek, Thomas F Eleveld, Andrew Green, Frederik J Hes, Marry M van den Heuvel-Eibrink, Annelore Van Der Kelen, Sabine Kliesch, Roland P Kuiper, Inge M M Lakeman, Lisa E E L O Lashley, Leendert H J Looijenga, Manon S Oud, Johanna Steingröver, Yardena Tenenbaum-Rakover, Carli M Tops, Frank Tüttelmann, Richarda M de Voer, Dineke Westra, Margot J Wyrwoll, Mariano Golubicki, Marina Antelo, Laia Bonjoch, Mariona Terradas, Laura Valle, Ludmil B Alexandrov, Hans Morreau, Tom van Wezel, Sergi Castellví-Bel, Yael Goldberg, Maartje Nielsen
MCM8 and MCM9 are newly proposed cancer predisposition genes, linked to polyposis and early-onset cancer, in addition to their previously established association with hypogonadism. Given the uncertain range of phenotypic manifestations and unclear cancer risk estimates, this study aimed to delineate the molecular and clinical characteristics of biallelic germline MCM8/MCM9 variant carriers. We found significant enrichment of biallelic MCM9 variants in individuals with colonic polyps (odds ratio [OR] 6.51, 95% confidence interval [CI] 1.24-34.11, p = 0.03), rectal polyps (OR 8.40, 95% CI 1.28-55.35, p = 0.03), and gastric cancer (OR 27.03, 95% CI 2.93-248.5; p = 0.004) in data from the 100000 Genomes Project, compared to controls. No similar enrichment was found for biallelic MCM8 variants or in the 200000 UK Biobank. Likewise, in our case series, which included 26 MCM8 and 28 MCM9 variant carriers, we documented polyposis, gastric cancer, and early-onset colorectal cancer (CRC) in MCM9 carriers but not in MCM8 carriers. Moreover, our case series indicates that beyond hypogonadism, biallelic MCM8 and MCM9 variants are associated with early-onset germ cell tumors (occurring before age 15). Tumors from MCM8/MCM9 variant carriers predominantly displayed clock-like mutational processes, without evidence of DNA repair deficiency-associated signatures. Collectively, our data indicate that biallelic MCM9 variants are associated with polyposis, gastric cancer, and early-onset CRC, while both biallelic MCM8 and MCM9 variants are linked to hypogonadism and the early development of germ cell tumors. These findings underscore the importance of including MCM8/MCM9 in diagnostic gene panels for certain clinical contexts and suggest that biallelic carriers may benefit from cancer surveillance.
MCM8和MCM9是新发现的癌症易感基因,除了与性腺功能减退有关外,还与息肉病和早发性癌症有关。鉴于表型表现的不确定范围和不明确的癌症风险估计,本研究旨在描述双等位种系MCM8/MCM9变异携带者的分子和临床特征。我们发现结肠息肉患者中双等位基因MCM9变异显著富集(OR 6.51, 95% CI 1.24-34.11;P=0.03),直肠息肉(OR 8.40, 95% CI 1.28-55.35;P=0.03),胃癌(OR 27.03, 95% CI 2.93-248.5;P=0.004),与对照组相比。双等位基因MCM8变异或在200K UK Biobank中没有发现类似的富集。同样,在我们的病例系列中,包括26个MCM8和28个MCM9变异携带者,我们记录了MCM9携带者中有息肉病、胃癌和早发性结直肠癌,但在MCM8携带者中没有。此外,我们的病例系列表明,除了性腺功能减退,双等位基因MCM8和MCM9变异与早发性生殖细胞肿瘤(发生在15岁之前)有关。来自MCM8/MCM9变异携带者的肿瘤主要显示时钟样突变过程,没有DNA修复缺陷相关特征的证据。总的来说,我们的数据表明,双等位基因MCM9变异与息肉病、胃癌和早发性结直肠癌有关,而双等位基因MCM8和MCM9变异与性腺功能减退和生殖细胞肿瘤的早期发展有关。这些发现强调了在某些临床情况下将MCM8/MCM9纳入诊断基因面板的重要性,并表明双等位基因携带者可能受益于癌症监测。
{"title":"Clinical syndromes linked to biallelic germline variants in MCM8 and MCM9.","authors":"Noah C Helderman, Ting Yang, Claire Palles, Diantha Terlouw, Hailiang Mei, Ruben H P Vorderman, Davy Cats, Marcos Díaz-Gay, Marjolijn C J Jongmans, Ashwin Ramdien, Irma van de Beek, Thomas F Eleveld, Andrew Green, Frederik J Hes, Marry M van den Heuvel-Eibrink, Annelore Van Der Kelen, Sabine Kliesch, Roland P Kuiper, Inge M M Lakeman, Lisa E E L O Lashley, Leendert H J Looijenga, Manon S Oud, Johanna Steingröver, Yardena Tenenbaum-Rakover, Carli M Tops, Frank Tüttelmann, Richarda M de Voer, Dineke Westra, Margot J Wyrwoll, Mariano Golubicki, Marina Antelo, Laia Bonjoch, Mariona Terradas, Laura Valle, Ludmil B Alexandrov, Hans Morreau, Tom van Wezel, Sergi Castellví-Bel, Yael Goldberg, Maartje Nielsen","doi":"10.1016/j.xhgg.2025.100480","DOIUrl":"10.1016/j.xhgg.2025.100480","url":null,"abstract":"<p><p>MCM8 and MCM9 are newly proposed cancer predisposition genes, linked to polyposis and early-onset cancer, in addition to their previously established association with hypogonadism. Given the uncertain range of phenotypic manifestations and unclear cancer risk estimates, this study aimed to delineate the molecular and clinical characteristics of biallelic germline MCM8/MCM9 variant carriers. We found significant enrichment of biallelic MCM9 variants in individuals with colonic polyps (odds ratio [OR] 6.51, 95% confidence interval [CI] 1.24-34.11, p = 0.03), rectal polyps (OR 8.40, 95% CI 1.28-55.35, p = 0.03), and gastric cancer (OR 27.03, 95% CI 2.93-248.5; p = 0.004) in data from the 100000 Genomes Project, compared to controls. No similar enrichment was found for biallelic MCM8 variants or in the 200000 UK Biobank. Likewise, in our case series, which included 26 MCM8 and 28 MCM9 variant carriers, we documented polyposis, gastric cancer, and early-onset colorectal cancer (CRC) in MCM9 carriers but not in MCM8 carriers. Moreover, our case series indicates that beyond hypogonadism, biallelic MCM8 and MCM9 variants are associated with early-onset germ cell tumors (occurring before age 15). Tumors from MCM8/MCM9 variant carriers predominantly displayed clock-like mutational processes, without evidence of DNA repair deficiency-associated signatures. Collectively, our data indicate that biallelic MCM9 variants are associated with polyposis, gastric cancer, and early-onset CRC, while both biallelic MCM8 and MCM9 variants are linked to hypogonadism and the early development of germ cell tumors. These findings underscore the importance of including MCM8/MCM9 in diagnostic gene panels for certain clinical contexts and suggest that biallelic carriers may benefit from cancer surveillance.</p>","PeriodicalId":34530,"journal":{"name":"HGG Advances","volume":" ","pages":"100480"},"PeriodicalIF":3.6,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12361757/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144668677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-09Epub Date: 2025-07-26DOI: 10.1016/j.xhgg.2025.100485
Erfan Aref-Eshghi, Ingrid M Wentzensen, Tawfeg Ben-Omran, Reem Ibrahim Bux, Nina B Gold, Erin McRoy, Hoanh Nguyen, Lauren O'Grady, Shao Ching Tu, Yanmin Chen, Leandra Folk, Bobbi McGivern
Cohesin is a multiprotein complex that maintains chromosome integrity during cell division. Disruptions in cohesin or its regulators, including CHTF18, can lead to neurodevelopmental and congenital disorders known as cohesinopathies. CHTF18 participates in cohesin loading during DNA replication, but its role in human disease is not understood. Through exome analysis of >665,000 individuals, we identified multiple (<10) unrelated individuals with rare missense variants in CHTF18 and overlapping clinical phenotypes suggestive of a cohesinopathy disorder. Among these, three individuals with neurodevelopmental delay and epilepsy, each carrying a previously unreported rare de novo variant in CHTF18, are presented in detail. Overlapping clinical features of additional individuals who were not available for case-level consent are presented in aggregate. All the CHTF18 variants in the cohort were located in the vicinity of the AAA+ATPase domain of CHTF18, which plays a crucial role in cohesin loading during DNA replication. In addition to cohort findings from our large database, the function, relevance, and pathway involvement of CHTF18 make it a promising candidate gene for disease. The study calls for further research to explore the role of CHTF18 variants in disease and highlights the importance of including CHTF18 as a candidate gene in broad genetic testing for individuals with unsolved neurodevelopmental conditions.
{"title":"De novo missense variants in CHTF18: The potential to expand the clinical spectrum of cohesinopathies.","authors":"Erfan Aref-Eshghi, Ingrid M Wentzensen, Tawfeg Ben-Omran, Reem Ibrahim Bux, Nina B Gold, Erin McRoy, Hoanh Nguyen, Lauren O'Grady, Shao Ching Tu, Yanmin Chen, Leandra Folk, Bobbi McGivern","doi":"10.1016/j.xhgg.2025.100485","DOIUrl":"10.1016/j.xhgg.2025.100485","url":null,"abstract":"<p><p>Cohesin is a multiprotein complex that maintains chromosome integrity during cell division. Disruptions in cohesin or its regulators, including CHTF18, can lead to neurodevelopmental and congenital disorders known as cohesinopathies. CHTF18 participates in cohesin loading during DNA replication, but its role in human disease is not understood. Through exome analysis of >665,000 individuals, we identified multiple (<10) unrelated individuals with rare missense variants in CHTF18 and overlapping clinical phenotypes suggestive of a cohesinopathy disorder. Among these, three individuals with neurodevelopmental delay and epilepsy, each carrying a previously unreported rare de novo variant in CHTF18, are presented in detail. Overlapping clinical features of additional individuals who were not available for case-level consent are presented in aggregate. All the CHTF18 variants in the cohort were located in the vicinity of the AAA+ATPase domain of CHTF18, which plays a crucial role in cohesin loading during DNA replication. In addition to cohort findings from our large database, the function, relevance, and pathway involvement of CHTF18 make it a promising candidate gene for disease. The study calls for further research to explore the role of CHTF18 variants in disease and highlights the importance of including CHTF18 as a candidate gene in broad genetic testing for individuals with unsolved neurodevelopmental conditions.</p>","PeriodicalId":34530,"journal":{"name":"HGG Advances","volume":" ","pages":"100485"},"PeriodicalIF":3.6,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12375246/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144733584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-09Epub Date: 2025-07-05DOI: 10.1016/j.xhgg.2025.100478
Dimuthu Alankarage, Iryna Leshchynska, Stephanie Portelli, Alena Sipka, Gillian M Blue, Victoria O'Reilly, Debjani Das, Emma M Rath, Annabelle Enriquez, Michael Troup, Miriam Fine, Nicola Poplawski, Maxim Verlee, David T Humphreys, Richard P Harvey, Gavin Chapman, Edwin P Kirk, David S Winlaw, Bert Callewaert, Wendy K Chung, David Ascher, Eleni Giannoulatou, Sally L Dunwoodie
Mothers against decapentaplegic homolog 5 (SMAD5) is a transcriptional regulator that functions within the TGF-β signaling cascade. Evidence from animal studies show that it is crucial for dorsoventral patterning, left-right asymmetry, cardiac looping, and other embryonic processes. However, its role in human development has not been explored, and the contribution of SMAD5 variants to congenital disease is unknown. Here, we report SMAD5 variants identified in six unrelated families with seven individuals presenting with congenital heart disease (CHD). Isolated congenital heart defects are observed in six individuals who carry de novo or inherited missense, nonsense, frameshift, or copy-number variants in SMAD5. A multi-organ phenotype is observed in one individual with a de novo SMAD5 variant that alters an amino acid crucial for SMAD5 multimerization. Septal defects, identified in four individuals, are the most common cardiac lesion in our cohort, with hypoplastic left heart also observed in two individuals. In silico assessment of SMAD5 missense variants predicts disrupted binding to co-factors, and in vitro functional assessment shows changes in SMAD5 gene and protein expression, as well as impaired activation of a BMP4-responsive promoter by the variants. Our findings suggest haploinsufficiency as the underlying molecular mechanism in five of the six families, resulting in isolated CHD, with a SMAD5 dominant-negative variant identified in one family leading to multiple congenital defects. Here, we provide evidence that SMAD5 variants lead to CHD and offer a basis for future exploration of SMAD5 variants in both CHD and post-natal disease.
{"title":"Haploinsufficient variants in SMAD5 are associated with isolated congenital heart disease.","authors":"Dimuthu Alankarage, Iryna Leshchynska, Stephanie Portelli, Alena Sipka, Gillian M Blue, Victoria O'Reilly, Debjani Das, Emma M Rath, Annabelle Enriquez, Michael Troup, Miriam Fine, Nicola Poplawski, Maxim Verlee, David T Humphreys, Richard P Harvey, Gavin Chapman, Edwin P Kirk, David S Winlaw, Bert Callewaert, Wendy K Chung, David Ascher, Eleni Giannoulatou, Sally L Dunwoodie","doi":"10.1016/j.xhgg.2025.100478","DOIUrl":"10.1016/j.xhgg.2025.100478","url":null,"abstract":"<p><p>Mothers against decapentaplegic homolog 5 (SMAD5) is a transcriptional regulator that functions within the TGF-β signaling cascade. Evidence from animal studies show that it is crucial for dorsoventral patterning, left-right asymmetry, cardiac looping, and other embryonic processes. However, its role in human development has not been explored, and the contribution of SMAD5 variants to congenital disease is unknown. Here, we report SMAD5 variants identified in six unrelated families with seven individuals presenting with congenital heart disease (CHD). Isolated congenital heart defects are observed in six individuals who carry de novo or inherited missense, nonsense, frameshift, or copy-number variants in SMAD5. A multi-organ phenotype is observed in one individual with a de novo SMAD5 variant that alters an amino acid crucial for SMAD5 multimerization. Septal defects, identified in four individuals, are the most common cardiac lesion in our cohort, with hypoplastic left heart also observed in two individuals. In silico assessment of SMAD5 missense variants predicts disrupted binding to co-factors, and in vitro functional assessment shows changes in SMAD5 gene and protein expression, as well as impaired activation of a BMP4-responsive promoter by the variants. Our findings suggest haploinsufficiency as the underlying molecular mechanism in five of the six families, resulting in isolated CHD, with a SMAD5 dominant-negative variant identified in one family leading to multiple congenital defects. Here, we provide evidence that SMAD5 variants lead to CHD and offer a basis for future exploration of SMAD5 variants in both CHD and post-natal disease.</p>","PeriodicalId":34530,"journal":{"name":"HGG Advances","volume":" ","pages":"100478"},"PeriodicalIF":3.6,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12305712/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144576482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-09Epub Date: 2025-08-14DOI: 10.1016/j.xhgg.2025.100494
Jin Huang, Shijie Liang, Jiamin Sun, Huaping Chen
Hepatocellular carcinoma (HCC) progression is driven by metabolic reprogramming in the tumor microenvironment (TME), yet the causal regulators of pyruvate metabolism and their spatial interplay remain elusive. Here, we integrate single-cell transcriptomics, spatial mapping, and genetic causal inference to identify a pyruvate-hyperactive epithelial subpopulation (PyHighEpi) in HCC, characterized by enhanced stemness, proliferation, and metastatic traits. Spatial analyses reveal metabolic zonation, with pyruvate activity concentrated in tumor cores and associated with aggressive clones. Summary data-based Mendelian randomization identifies fumarylacetoacetate hydrolase domain containing 1 (FAHD1) as a potential causal driver, with its expression associated with a poor prognosis. FAHD1+epi cells interact with cancer-associated fibroblasts through ITGB2-mediated interactions, facilitating the formation of a transforming growth factor-β/vascular endothelial growth factor-enriched niche that promotes immune evasion. Clinically, FAHD1 overexpression correlated with poor prognosis, validated through functional assays showing its knockdown suppressed proliferation, invasion, and migration in HCC models. An FAHD1-derived risk score robustly stratifies patient prognosis and predicts responsiveness to immunotherapy, while molecular docking highlighted tivozanib as a potential FAHD1-targeting agent.
{"title":"FAHD1-mediated pyruvate metabolism in hepatocellular carcinoma: Multi-omics and causal genetic evidence.","authors":"Jin Huang, Shijie Liang, Jiamin Sun, Huaping Chen","doi":"10.1016/j.xhgg.2025.100494","DOIUrl":"10.1016/j.xhgg.2025.100494","url":null,"abstract":"<p><p>Hepatocellular carcinoma (HCC) progression is driven by metabolic reprogramming in the tumor microenvironment (TME), yet the causal regulators of pyruvate metabolism and their spatial interplay remain elusive. Here, we integrate single-cell transcriptomics, spatial mapping, and genetic causal inference to identify a pyruvate-hyperactive epithelial subpopulation (PyHighEpi) in HCC, characterized by enhanced stemness, proliferation, and metastatic traits. Spatial analyses reveal metabolic zonation, with pyruvate activity concentrated in tumor cores and associated with aggressive clones. Summary data-based Mendelian randomization identifies fumarylacetoacetate hydrolase domain containing 1 (FAHD1) as a potential causal driver, with its expression associated with a poor prognosis. FAHD1+epi cells interact with cancer-associated fibroblasts through ITGB2-mediated interactions, facilitating the formation of a transforming growth factor-β/vascular endothelial growth factor-enriched niche that promotes immune evasion. Clinically, FAHD1 overexpression correlated with poor prognosis, validated through functional assays showing its knockdown suppressed proliferation, invasion, and migration in HCC models. An FAHD1-derived risk score robustly stratifies patient prognosis and predicts responsiveness to immunotherapy, while molecular docking highlighted tivozanib as a potential FAHD1-targeting agent.</p>","PeriodicalId":34530,"journal":{"name":"HGG Advances","volume":" ","pages":"100494"},"PeriodicalIF":3.6,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12414894/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144856590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-09Epub Date: 2025-07-28DOI: 10.1016/j.xhgg.2025.100486
Christina G Hutten, Frederick J Boehm, Jennifer A Smith, Brian W Spitzer, Sylvia Wassertheil-Smoller, Carmen R Isasi, Jianwen Cai, Jonathan T Unkart, Jiehuan Sun, Victoria Persky, Martha L Daviglus, Tamar Sofer, Maria Argos
Coronary heart disease (CHD) is a leading cause of death among Hispanics/Latinos in the United States (US) whose underrepresentation in genomic research may worsen health disparities. We evaluated predictive performance of polygenic risk scores (PRSs) for myocardial infarction (MI) using data from the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), a cohort of 16,415 participants from 4 US centers. Standardized CHD-PRSs were derived (LDpred, AnnoPred, stacked clumping and thresholding, and LDPred2-GPSmult) and evaluated in survey-weighted Cox models for time to adjudicated MI, adjusted for age, sex, and first 5 principal components. Analyses were stratified by Caribbean (Puerto Rican, Dominican, or Cuban) and Mainland (Mexican, Central American, or South American) heritage. Concordance statistic (C-index), integrated discrimination improvement (IDI), and net reclassification improvement (NRI) were used to compare PRS performances with traditional risk factors (TRFs). Over 13 years (2008-2021), MI incidence was 1.9% (n = 140/7,248), mean age 48.7 years, 61% female. PRSs showed stronger associations with MI among Mainland participants; LDPred2-GPSmult+TRFs performed best (hazard ratio = 2.09; 95% confidence interval 1.59-2.75; C-index = 0.884; IDI p < 0.001; NRI p < 0.001; and improved C-index over TRFs by 0.008). Among Caribbean participants, AnnoPred+TRFs performed best (C-index = 0.739) and LDPred2-GPSmults discriminated best (IDI p = 0.02), but neither were significantly associated with MI risk. PRS performance remains limited among Caribbean individuals with substantial African ancestry. AnnoPred and LDPred2-GPSmult showed potential that leveraging functional annotations and multi-trait approaches may enhance risk prediction in diverse populations. These findings emphasize the need to optimize genetic risk prediction of CHD in underrepresented Hispanic/Latino populations.
{"title":"Differential performance of polygenic risk scores for heart disease in Hispanic/Latino subgroups: Findings of the Hispanic Community Health Study/Study of Latinos.","authors":"Christina G Hutten, Frederick J Boehm, Jennifer A Smith, Brian W Spitzer, Sylvia Wassertheil-Smoller, Carmen R Isasi, Jianwen Cai, Jonathan T Unkart, Jiehuan Sun, Victoria Persky, Martha L Daviglus, Tamar Sofer, Maria Argos","doi":"10.1016/j.xhgg.2025.100486","DOIUrl":"10.1016/j.xhgg.2025.100486","url":null,"abstract":"<p><p>Coronary heart disease (CHD) is a leading cause of death among Hispanics/Latinos in the United States (US) whose underrepresentation in genomic research may worsen health disparities. We evaluated predictive performance of polygenic risk scores (PRSs) for myocardial infarction (MI) using data from the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), a cohort of 16,415 participants from 4 US centers. Standardized CHD-PRSs were derived (LDpred, AnnoPred, stacked clumping and thresholding, and LDPred2-GPSmult) and evaluated in survey-weighted Cox models for time to adjudicated MI, adjusted for age, sex, and first 5 principal components. Analyses were stratified by Caribbean (Puerto Rican, Dominican, or Cuban) and Mainland (Mexican, Central American, or South American) heritage. Concordance statistic (C-index), integrated discrimination improvement (IDI), and net reclassification improvement (NRI) were used to compare PRS performances with traditional risk factors (TRFs). Over 13 years (2008-2021), MI incidence was 1.9% (n = 140/7,248), mean age 48.7 years, 61% female. PRSs showed stronger associations with MI among Mainland participants; LDPred2-GPSmult+TRFs performed best (hazard ratio = 2.09; 95% confidence interval 1.59-2.75; C-index = 0.884; IDI p < 0.001; NRI p < 0.001; and improved C-index over TRFs by 0.008). Among Caribbean participants, AnnoPred+TRFs performed best (C-index = 0.739) and LDPred2-GPSmults discriminated best (IDI p = 0.02), but neither were significantly associated with MI risk. PRS performance remains limited among Caribbean individuals with substantial African ancestry. AnnoPred and LDPred2-GPSmult showed potential that leveraging functional annotations and multi-trait approaches may enhance risk prediction in diverse populations. These findings emphasize the need to optimize genetic risk prediction of CHD in underrepresented Hispanic/Latino populations.</p>","PeriodicalId":34530,"journal":{"name":"HGG Advances","volume":" ","pages":"100486"},"PeriodicalIF":3.6,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12391791/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144745282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-09Epub Date: 2025-06-21DOI: 10.1016/j.xhgg.2025.100470
Yitang Sun, Huifang Xu, Kaixiong Ye
Previous genome-wide association studies (GWAS) have identified genetic loci associated with the circulating levels of fatty acids (FAs), but the biological mechanisms of these genetic associations remain largely unexplored. Here, we conducted GWAS to identify additional genetic loci for 19 circulating FA traits in UK Biobank participants of European ancestry (n = 239,268) and five other ancestries (n = 508-4,663). We leveraged the GWAS findings to characterize genetic correlations and colocalized regions among FAs, explore sex differences, examine FA loci influenced by lipoprotein metabolism, and apply statistical fine-mapping to pinpoint putative causal variants. We integrated GWAS signals with multi-omics quantitative trait loci (QTL) to reveal intermediate molecular phenotypes mediating the associations between the genetic loci and FA levels. We identified 215 genome-wide significant, independent loci for polyunsaturated fatty acid (PUFA)-related traits in European participants, 163 loci for monounsaturated fatty acid (MUFA)-related traits, and 119 loci for saturated fatty acid (SFA)-related traits, including 70, 61, and 54 novel loci, respectively. A novel locus for total FAs, the percentage of omega-6 PUFAs in total FAs, and total MUFAs (around genes GSTT1/2/2B) colocalized with QTL signals for all six molecular phenotypes examined, including gene expression, protein abundance, DNA methylation, splicing, histone modification, and chromatin accessibility. Across 19 FA traits, 35% of GWAS loci colocalized with QTL signals for at least one molecular phenotype. Our study identifies novel genetic loci for circulating FA levels and systematically uncovers their underlying molecular mechanisms.
{"title":"GWAS and multi-omics integrative analysis reveal novel loci and their molecular mechanisms for circulating fatty acids.","authors":"Yitang Sun, Huifang Xu, Kaixiong Ye","doi":"10.1016/j.xhgg.2025.100470","DOIUrl":"10.1016/j.xhgg.2025.100470","url":null,"abstract":"<p><p>Previous genome-wide association studies (GWAS) have identified genetic loci associated with the circulating levels of fatty acids (FAs), but the biological mechanisms of these genetic associations remain largely unexplored. Here, we conducted GWAS to identify additional genetic loci for 19 circulating FA traits in UK Biobank participants of European ancestry (n = 239,268) and five other ancestries (n = 508-4,663). We leveraged the GWAS findings to characterize genetic correlations and colocalized regions among FAs, explore sex differences, examine FA loci influenced by lipoprotein metabolism, and apply statistical fine-mapping to pinpoint putative causal variants. We integrated GWAS signals with multi-omics quantitative trait loci (QTL) to reveal intermediate molecular phenotypes mediating the associations between the genetic loci and FA levels. We identified 215 genome-wide significant, independent loci for polyunsaturated fatty acid (PUFA)-related traits in European participants, 163 loci for monounsaturated fatty acid (MUFA)-related traits, and 119 loci for saturated fatty acid (SFA)-related traits, including 70, 61, and 54 novel loci, respectively. A novel locus for total FAs, the percentage of omega-6 PUFAs in total FAs, and total MUFAs (around genes GSTT1/2/2B) colocalized with QTL signals for all six molecular phenotypes examined, including gene expression, protein abundance, DNA methylation, splicing, histone modification, and chromatin accessibility. Across 19 FA traits, 35% of GWAS loci colocalized with QTL signals for at least one molecular phenotype. Our study identifies novel genetic loci for circulating FA levels and systematically uncovers their underlying molecular mechanisms.</p>","PeriodicalId":34530,"journal":{"name":"HGG Advances","volume":" ","pages":"100470"},"PeriodicalIF":3.6,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12272887/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144369236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-09Epub Date: 2025-08-23DOI: 10.1016/j.xhgg.2025.100497
Claire M Kittock, Krishna Karia, Pratiksha Kc, Claire Evans, Jared Wollman, Brandon L Meyerink, Louis-Jan Pilaz
The increasing availability and affordability of genetic testing has resulted in the identification of numerous novel variants associated with neurodevelopmental disorders. There remains a need for methods to analyze the functional impact of these variants. Some methods, like expressing these variants in cell culture, may be rapid, but they lack physiologic context. Other methods, like making a whole-mouse model, may provide physiologic accuracy, but these are costly and time-consuming. We recently developed a technique, Breasi-CRISPR (Brain Easi-CRISPR), which results in efficient genome editing of neural precursor cells via electroporation of CRISPR-Cas9 reagents into developing mouse brains. Since Breasi-CRISPR is extremely rapid and enables the analysis of targeted genes in vivo, we wondered whether this technique would accelerate the study of monogenic neurodevelopmental disorders. Here, we use Breasi-CRISPR to model megalencephaly postaxial polydactyly polymicrogyria hydrocephalus (MPPH) syndrome. We found that 2 days after Breasi-CRISPR, we were able to see neurodevelopmental phenotypes known to be associated with MPPH syndrome, including increased cyclin D2 protein abundance and an increase in neural progenitor proliferation. Thus, Breasi-CRISPR can efficiently model MPPH syndrome and may be a powerful method to add to the toolbox of those investigating the functional impact of patient variants in neurodevelopmental disorders.
{"title":"Modeling MPPH syndrome in vivo using Breasi-CRISPR.","authors":"Claire M Kittock, Krishna Karia, Pratiksha Kc, Claire Evans, Jared Wollman, Brandon L Meyerink, Louis-Jan Pilaz","doi":"10.1016/j.xhgg.2025.100497","DOIUrl":"10.1016/j.xhgg.2025.100497","url":null,"abstract":"<p><p>The increasing availability and affordability of genetic testing has resulted in the identification of numerous novel variants associated with neurodevelopmental disorders. There remains a need for methods to analyze the functional impact of these variants. Some methods, like expressing these variants in cell culture, may be rapid, but they lack physiologic context. Other methods, like making a whole-mouse model, may provide physiologic accuracy, but these are costly and time-consuming. We recently developed a technique, Breasi-CRISPR (Brain Easi-CRISPR), which results in efficient genome editing of neural precursor cells via electroporation of CRISPR-Cas9 reagents into developing mouse brains. Since Breasi-CRISPR is extremely rapid and enables the analysis of targeted genes in vivo, we wondered whether this technique would accelerate the study of monogenic neurodevelopmental disorders. Here, we use Breasi-CRISPR to model megalencephaly postaxial polydactyly polymicrogyria hydrocephalus (MPPH) syndrome. We found that 2 days after Breasi-CRISPR, we were able to see neurodevelopmental phenotypes known to be associated with MPPH syndrome, including increased cyclin D2 protein abundance and an increase in neural progenitor proliferation. Thus, Breasi-CRISPR can efficiently model MPPH syndrome and may be a powerful method to add to the toolbox of those investigating the functional impact of patient variants in neurodevelopmental disorders.</p>","PeriodicalId":34530,"journal":{"name":"HGG Advances","volume":" ","pages":"100497"},"PeriodicalIF":3.6,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12447981/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144972063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}