Pub Date : 2025-12-23DOI: 10.1016/j.ajhg.2025.12.002
Alan Mejia Maza, Madison Hincher, Kevin Correia, Tammy Gillis, Ayumi Nishiyama, Ellen B. Penney, Aloysius Domingo, Rachita Yadav, Micaela G. Murcar, Patrick D. Villafria Mercado, Justin S. Han, Ean P. Norenberg, Cara Fernandez-Cerado, G. Paul Legarda, Michelle Sy, Edwin L. Muñoz, Mark C. Ang, Cid Czarina E. Diesta, Criscely Go, Nutan Sharma, D. Cristopher Bragg, Michael E. Talkowski, Marcy E. MacDonald, Jong-Min Lee, Laurie J. Ozelius, Vanessa Chantal Wheeler
{"title":"MSH3 is a genetic modifier of somatic repeat instability in X-linked dystonia parkinsonism","authors":"Alan Mejia Maza, Madison Hincher, Kevin Correia, Tammy Gillis, Ayumi Nishiyama, Ellen B. Penney, Aloysius Domingo, Rachita Yadav, Micaela G. Murcar, Patrick D. Villafria Mercado, Justin S. Han, Ean P. Norenberg, Cara Fernandez-Cerado, G. Paul Legarda, Michelle Sy, Edwin L. Muñoz, Mark C. Ang, Cid Czarina E. Diesta, Criscely Go, Nutan Sharma, D. Cristopher Bragg, Michael E. Talkowski, Marcy E. MacDonald, Jong-Min Lee, Laurie J. Ozelius, Vanessa Chantal Wheeler","doi":"10.1016/j.ajhg.2025.12.002","DOIUrl":"https://doi.org/10.1016/j.ajhg.2025.12.002","url":null,"abstract":"","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":"22 1","pages":""},"PeriodicalIF":9.8,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145822762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-22DOI: 10.1016/j.ajhg.2025.11.016
M. Dybdahl Krebs, Vivek Appadurai, Kajsa-Lotta Georgii Hellberg, Henrik Ohlsson, Jette Steinbach, Emil Pedersen, iPSYCH Study Consortium, Thomas Werge, Jan Sundquist, Kristina Sundquist, Richard Border, Na Cai, Noah Zaitlen, Andy Dahl, Bjarni Vilhjalmsson, Jonathan Flint, Silviu-Alin Bacanu, Kenneth S. Kendler, Andrew J. Schork
Genetics as a science has roots in studying phenotypes of relatives, but molecular approaches facilitate direct measurements of genomic variation between individuals. Agricultural and human biomedical research are both emphasizing genotype-based instruments, such as polygenic scores, but unlike in agriculture, there is an emerging consensus that family variables act nearly independently of genotypes in models of human disease. However, there is insufficient theoretical treatment of these scores, especially guiding our understanding of how and why scores derived from different sources of data may combine. To advance our understanding of this phenomenon, we use 2,066,057 family records of 99,645 genotyped probands from the Integrative Psychiatric Research (iPSYCH)2015 case-cohort study to show that state-of-the-field genotype- and phenotype-based genetic instruments explain largely independent components of liability to psychiatric disorders. We support these empirical results with theoretical analysis and simulations to describe, in a human biomedical context, parameters affecting current and future performance of the two approaches, their expected interrelationships, and consistency of observed results with expectations under simple additive, polygenic liability models of disease. We conclude, at least for psychiatric disorders, that the low correlation between current phenotype- and genotype-based genetic instruments is caused by both being noisy measures of additive genetic liability. We expect they should remain complementary over the near future and therefore expect approaches integrating both sources of information to achieve more power for genetic inference.
{"title":"The relationship between genotype- and phenotype-based estimates of genetic liability to psychiatric disorders, in practice and in theory","authors":"M. Dybdahl Krebs, Vivek Appadurai, Kajsa-Lotta Georgii Hellberg, Henrik Ohlsson, Jette Steinbach, Emil Pedersen, iPSYCH Study Consortium, Thomas Werge, Jan Sundquist, Kristina Sundquist, Richard Border, Na Cai, Noah Zaitlen, Andy Dahl, Bjarni Vilhjalmsson, Jonathan Flint, Silviu-Alin Bacanu, Kenneth S. Kendler, Andrew J. Schork","doi":"10.1016/j.ajhg.2025.11.016","DOIUrl":"https://doi.org/10.1016/j.ajhg.2025.11.016","url":null,"abstract":"Genetics as a science has roots in studying phenotypes of relatives, but molecular approaches facilitate direct measurements of genomic variation between individuals. Agricultural and human biomedical research are both emphasizing genotype-based instruments, such as polygenic scores, but unlike in agriculture, there is an emerging consensus that family variables act nearly independently of genotypes in models of human disease. However, there is insufficient theoretical treatment of these scores, especially guiding our understanding of how and why scores derived from different sources of data may combine. To advance our understanding of this phenomenon, we use 2,066,057 family records of 99,645 genotyped probands from the Integrative Psychiatric Research (iPSYCH)2015 case-cohort study to show that state-of-the-field genotype- and phenotype-based genetic instruments explain largely independent components of liability to psychiatric disorders. We support these empirical results with theoretical analysis and simulations to describe, in a human biomedical context, parameters affecting current and future performance of the two approaches, their expected interrelationships, and consistency of observed results with expectations under simple additive, polygenic liability models of disease. We conclude, at least for psychiatric disorders, that the low correlation between current phenotype- and genotype-based genetic instruments is caused by both being noisy measures of additive genetic liability. We expect they should remain complementary over the near future and therefore expect approaches integrating both sources of information to achieve more power for genetic inference.","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":"2 1","pages":""},"PeriodicalIF":9.8,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145813928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-22DOI: 10.1016/j.ajhg.2025.11.014
Gabriela M Ramírez Renta,India D Little,Laura M Koehly,Anna J Hilliard,Kaylee L Foor,Jessica Butts,Jordan Lundeen,Chris Gunter
Genetic literacy goes beyond knowledge of genetic terms, as it requires sufficient skills and understanding to effectively facilitate health-related decision-making and participation in social discussions about genetic issues. Personal identity and beliefs have been shown to affect how individuals interact with new information, but rarely in the context of genetic literacy. In 2021, we created and disseminated a survey to two separate samples: 2,050 members of the US general public and 2,023 participants in a large genetic research study. We assessed genetic literacy through three components: subjective knowledge (Familiarity), objective knowledge (Knowledge), and knowledge comprehension (Skills), making this one of the only large-scale surveys to assess comprehension as a part of genetic literacy. We hypothesized that additional measures of identity and belief factors would enable a better understanding of how individuals process and retain genetic information. We found that confidence in one's genetic knowledge was the strongest predictor of positive scores in all three components, controlling nearly 25% of the variance in scores, while perceived importance of genetic information had a positive but weaker relationship to scores. This suggests that improving confidence, not just providing knowledge, is an important part of increasing uptake of genetics in various applications. Further, we found that multiple self-described beliefs had mixed predictive effects on all three of our genetic literacy subscales. These findings demonstrate the complexity inherent in endeavors to raise genetic literacy in the US population as an example, as well as the importance of context-specific genetics communication.
{"title":"Interaction of identity and beliefs with genetic literacy.","authors":"Gabriela M Ramírez Renta,India D Little,Laura M Koehly,Anna J Hilliard,Kaylee L Foor,Jessica Butts,Jordan Lundeen,Chris Gunter","doi":"10.1016/j.ajhg.2025.11.014","DOIUrl":"https://doi.org/10.1016/j.ajhg.2025.11.014","url":null,"abstract":"Genetic literacy goes beyond knowledge of genetic terms, as it requires sufficient skills and understanding to effectively facilitate health-related decision-making and participation in social discussions about genetic issues. Personal identity and beliefs have been shown to affect how individuals interact with new information, but rarely in the context of genetic literacy. In 2021, we created and disseminated a survey to two separate samples: 2,050 members of the US general public and 2,023 participants in a large genetic research study. We assessed genetic literacy through three components: subjective knowledge (Familiarity), objective knowledge (Knowledge), and knowledge comprehension (Skills), making this one of the only large-scale surveys to assess comprehension as a part of genetic literacy. We hypothesized that additional measures of identity and belief factors would enable a better understanding of how individuals process and retain genetic information. We found that confidence in one's genetic knowledge was the strongest predictor of positive scores in all three components, controlling nearly 25% of the variance in scores, while perceived importance of genetic information had a positive but weaker relationship to scores. This suggests that improving confidence, not just providing knowledge, is an important part of increasing uptake of genetics in various applications. Further, we found that multiple self-described beliefs had mixed predictive effects on all three of our genetic literacy subscales. These findings demonstrate the complexity inherent in endeavors to raise genetic literacy in the US population as an example, as well as the importance of context-specific genetics communication.","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":"23 1","pages":""},"PeriodicalIF":9.8,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145813553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-22DOI: 10.1016/j.ajhg.2025.11.015
Kirill Zaslavsky,Liyin Chen,Chloe Park,Emily M Place,Daniel Navarro-Gomez,Seyedeh M Zekavat,Christopher F Barile,Kinga M Bujakowska,Elizabeth J Rossin,Eric A Pierce
Inherited retinal degenerations (IRDs) are the leading cause of blindness in working-age adults and are thought to be monogenic with near-complete penetrance. However, traditional variant discovery based on phenotypic ascertainment may inflate penetrance estimates and obscure the true genotype-phenotype spectrum. We used large biobanks with linked genomic and clinical data to quantify the population-level penetrance of IRD-associated variants. We screened 317,964 All of Us (AoU) participants for loss-of-function or pathogenic IRD variants to curate a cohort with definite IRD-compatible genotypes. We defined three nested International Classification of Diseases (ICD)-9/10 code sets ("IRD," "retinopathy," and "screening") to derive lower- and upper-bound penetrance estimates via disease annotation frequencies (DAFs). Within a cohort of 481 AoU participants with definite IRD-compatible genotypes, DAFs ranged from 9.4% (IRD) to 28.1% (screening), which were enriched relative to the prevalence of the code sets in AoU (p < 0.001). For validation, we examined retinal imaging of UK Biobank (UKB) participants who shared variants with the AoU cohort. In the UKB, 16.1%-27.9% of participants with shared variants exhibited definite or possible IRD features, concordant with AoU estimates. Participant demographics, smoking, socioeconomic status, and comorbidities did not predict penetrance. These results show that the population penetrance of IRD-associated genotypes is markedly lower than traditionally assumed. This suggests that genetic or environmental modifiers are required to manifest disease and that IRD genotypes are more prevalent (0.7%-2.1%) than expected. These findings inform our understanding of the genetic causality of IRDs, impact the clinical use of genetic testing, and have implications for the development of therapies for IRDs.
遗传性视网膜变性(IRDs)是导致工作年龄成年人失明的主要原因,被认为是单基因的,几乎完全外显。然而,传统的基于表型确定的变异发现可能会夸大外显率估计并模糊真正的基因型-表型谱。我们使用具有相关基因组和临床数据的大型生物库来量化ird相关变异的人群水平外显率。我们筛选了317,964名All Us (AoU)参与者的功能丧失或致病性IRD变异,以建立一个明确的IRD相容基因型的队列。我们定义了三个嵌套的国际疾病分类(ICD)-9/10代码集(“IRD”、“视网膜病变”和“筛查”),通过疾病注释频率(daf)得出下限和上限外显率估计。在481名具有明确的IRD相容基因型的AoU参与者中,daf范围从9.4% (IRD)到28.1%(筛选),相对于AoU中代码集的流行程度(p < 0.001)。为了验证,我们检查了与AoU队列共享变异的UK Biobank (UKB)参与者的视网膜成像。在英国,16.1%-27.9%具有共同变异的参与者表现出明确或可能的IRD特征,与AoU估计一致。参与者的人口统计、吸烟、社会经济地位和合并症不能预测外显率。这些结果表明,ird相关基因型的群体外显率明显低于传统的假设。这表明需要遗传或环境修饰因子来表现疾病,并且IRD基因型比预期更普遍(0.7%-2.1%)。这些发现使我们了解了IRDs的遗传因果关系,影响了基因检测的临床应用,并对IRDs的治疗方法的发展产生了影响。
{"title":"Low population penetrance of variants associated with inherited retinal degenerations.","authors":"Kirill Zaslavsky,Liyin Chen,Chloe Park,Emily M Place,Daniel Navarro-Gomez,Seyedeh M Zekavat,Christopher F Barile,Kinga M Bujakowska,Elizabeth J Rossin,Eric A Pierce","doi":"10.1016/j.ajhg.2025.11.015","DOIUrl":"https://doi.org/10.1016/j.ajhg.2025.11.015","url":null,"abstract":"Inherited retinal degenerations (IRDs) are the leading cause of blindness in working-age adults and are thought to be monogenic with near-complete penetrance. However, traditional variant discovery based on phenotypic ascertainment may inflate penetrance estimates and obscure the true genotype-phenotype spectrum. We used large biobanks with linked genomic and clinical data to quantify the population-level penetrance of IRD-associated variants. We screened 317,964 All of Us (AoU) participants for loss-of-function or pathogenic IRD variants to curate a cohort with definite IRD-compatible genotypes. We defined three nested International Classification of Diseases (ICD)-9/10 code sets (\"IRD,\" \"retinopathy,\" and \"screening\") to derive lower- and upper-bound penetrance estimates via disease annotation frequencies (DAFs). Within a cohort of 481 AoU participants with definite IRD-compatible genotypes, DAFs ranged from 9.4% (IRD) to 28.1% (screening), which were enriched relative to the prevalence of the code sets in AoU (p < 0.001). For validation, we examined retinal imaging of UK Biobank (UKB) participants who shared variants with the AoU cohort. In the UKB, 16.1%-27.9% of participants with shared variants exhibited definite or possible IRD features, concordant with AoU estimates. Participant demographics, smoking, socioeconomic status, and comorbidities did not predict penetrance. These results show that the population penetrance of IRD-associated genotypes is markedly lower than traditionally assumed. This suggests that genetic or environmental modifiers are required to manifest disease and that IRD genotypes are more prevalent (0.7%-2.1%) than expected. These findings inform our understanding of the genetic causality of IRDs, impact the clinical use of genetic testing, and have implications for the development of therapies for IRDs.","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":"22 1","pages":""},"PeriodicalIF":9.8,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145813552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.1016/j.ajhg.2025.11.010
Jessica I. Gold, Yehuda Elkaim, Nina B. Gold, Stephanie Asher, Anna Raper, Courtney Condit, Zoe Bogus, Isaac Elysee, Laura Hennessy, Emma Kennedy, Lauren C. Briere, David A. Sweetser, Colleen Kripke, Anurag Verma, Hojjat Salmasian, Latrice Landry, Katherine L. Nathanson, Staci Kallish, Theodore G. Drivas
{"title":"Racial and socioeconomic disparities in genetic evaluation and testing in the adult patient population","authors":"Jessica I. Gold, Yehuda Elkaim, Nina B. Gold, Stephanie Asher, Anna Raper, Courtney Condit, Zoe Bogus, Isaac Elysee, Laura Hennessy, Emma Kennedy, Lauren C. Briere, David A. Sweetser, Colleen Kripke, Anurag Verma, Hojjat Salmasian, Latrice Landry, Katherine L. Nathanson, Staci Kallish, Theodore G. Drivas","doi":"10.1016/j.ajhg.2025.11.010","DOIUrl":"https://doi.org/10.1016/j.ajhg.2025.11.010","url":null,"abstract":"","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":"11 1","pages":""},"PeriodicalIF":9.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145784776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-16DOI: 10.1016/j.ajhg.2025.11.012
Li Chen,Yazhou Guo,Junren Hou,Wen Yang,Ting Qi,Jian Yang
Non-coding RNAs (ncRNAs) play crucial roles in the regulation of gene expression, but their genetic underpinnings and roles in human traits and diseases remain largely elusive. Here, we identified 38,441 long non-coding RNAs (lncRNAs) and 23,548 circular RNAs (circRNAs) from RNA sequencing (RNA-seq) data of 2,865 human cortex samples, of which 27,453 lncRNAs and all circRNAs were not reported in GENCODE. Expression quantitative trait locus (eQTL) analyses identified cis-eQTLs for 15,362 lncRNAs and 1,312 circRNAs. We showed that lncRNA- or circRNA-eQTLs were largely independent of, and had larger effects on average than, eQTLs of their adjacent or parental protein-coding genes (PCGs). The circRNA-eQTLs were highly enriched in canonical splice sites, highlighting the crucial role of back-splicing in circRNA biogenesis. LncRNA-eQTLs were enriched for heritability of brain-related complex traits and associated with 72 (11.2%) of the colocalized genome-wide association study (GWAS) signals that showed no evidence of colocalization with PCG-eQTLs or splicing quantitative trait loci (QTLs) identified in the same dataset. We showcased lncRNAs (e.g., those near VPS45, MAPT, and RGS6) and circRNAs (e.g., that for GRIN2A) that may be implicated in complex traits through genetic regulation of ncRNAs. Our study provides insights into the genetic regulation of ncRNAs and their implications in brain-related complex traits.
{"title":"Genetic control of non-coding RNAs in the human brain and their implications for complex traits.","authors":"Li Chen,Yazhou Guo,Junren Hou,Wen Yang,Ting Qi,Jian Yang","doi":"10.1016/j.ajhg.2025.11.012","DOIUrl":"https://doi.org/10.1016/j.ajhg.2025.11.012","url":null,"abstract":"Non-coding RNAs (ncRNAs) play crucial roles in the regulation of gene expression, but their genetic underpinnings and roles in human traits and diseases remain largely elusive. Here, we identified 38,441 long non-coding RNAs (lncRNAs) and 23,548 circular RNAs (circRNAs) from RNA sequencing (RNA-seq) data of 2,865 human cortex samples, of which 27,453 lncRNAs and all circRNAs were not reported in GENCODE. Expression quantitative trait locus (eQTL) analyses identified cis-eQTLs for 15,362 lncRNAs and 1,312 circRNAs. We showed that lncRNA- or circRNA-eQTLs were largely independent of, and had larger effects on average than, eQTLs of their adjacent or parental protein-coding genes (PCGs). The circRNA-eQTLs were highly enriched in canonical splice sites, highlighting the crucial role of back-splicing in circRNA biogenesis. LncRNA-eQTLs were enriched for heritability of brain-related complex traits and associated with 72 (11.2%) of the colocalized genome-wide association study (GWAS) signals that showed no evidence of colocalization with PCG-eQTLs or splicing quantitative trait loci (QTLs) identified in the same dataset. We showcased lncRNAs (e.g., those near VPS45, MAPT, and RGS6) and circRNAs (e.g., that for GRIN2A) that may be implicated in complex traits through genetic regulation of ncRNAs. Our study provides insights into the genetic regulation of ncRNAs and their implications in brain-related complex traits.","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":"154 1","pages":""},"PeriodicalIF":9.8,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145771495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-11DOI: 10.1016/j.ajhg.2025.12.003
Rebecca W.Y. Chan, Lee Serpas, Meng Ni, Stefano Volpi, Linda T. Hiraki, Lai-Shan Tam, Ali Rashidfarrokhi, Priscilla C.H. Wong, Lydia H.P. Tam, Yueyang Wang, Peiyong Jiang, Alice S.H. Cheng, Wenlei Peng, Diana S.C. Han, Patty P.P. Tse, Pik Ki Lau, Wing-Shan Lee, Alberto Magnasco, Elisa Buti, Vanja Sisirak, Nora AlMutairi, K.C. Allen Chan, Rossa W.K. Chiu, Boris Reizis, Y.M. Dennis Lo
{"title":"Plasma DNA Profile Associated with DNASE1L3 Gene Mutations: Clinical Observations, Relationships to Nuclease Substrate Preference, and In Vivo Correction","authors":"Rebecca W.Y. Chan, Lee Serpas, Meng Ni, Stefano Volpi, Linda T. Hiraki, Lai-Shan Tam, Ali Rashidfarrokhi, Priscilla C.H. Wong, Lydia H.P. Tam, Yueyang Wang, Peiyong Jiang, Alice S.H. Cheng, Wenlei Peng, Diana S.C. Han, Patty P.P. Tse, Pik Ki Lau, Wing-Shan Lee, Alberto Magnasco, Elisa Buti, Vanja Sisirak, Nora AlMutairi, K.C. Allen Chan, Rossa W.K. Chiu, Boris Reizis, Y.M. Dennis Lo","doi":"10.1016/j.ajhg.2025.12.003","DOIUrl":"https://doi.org/10.1016/j.ajhg.2025.12.003","url":null,"abstract":"","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":"9 1","pages":""},"PeriodicalIF":9.8,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145731083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Large-scale biobanks provide comprehensive electronic health records (EHRs) that capture detailed clinical phenotypes, potentially enhancing disease prediction. However, traditional polygenic risk score (PRS) methods rely on simplified phenotype definitions or predefined trait sets, limiting their ability to represent the complex structures embedded within EHRs. To address this gap, we introduce EHR-embedding-enhanced PRS (EEPRS), leveraging phenotype embeddings derived from EHRs to improve PRSs using only genome-wide association study (GWAS) summary statistics. Employing embedding methods such as Word2Vec and GPT, we conducted EHR-embedding-based GWASs and identified a cardiovascular cluster via hierarchical clustering of genetic correlations. Across 41 traits in the UK Biobank, EEPRS consistently outperformed single-trait PRSs, particularly within this cluster. PRS-based phenome-wide association studies further demonstrated robust associations between EHR-embedding-based PRS and circulatory system diseases. We then developed EEPRS_optimal, a data-adaptive method that uses cross-validation to select the best embedding, yielding additional improvements. We also developed MTAG_EEPRS for multi-trait PRSs, which further improved prediction accuracy compared to single-trait PRSs and MTAG_PRS. Finally, we validated the benefits of EEPRS in the All of Us cohort for seven selected diseases. Overall, EEPRS represents a robust and interpretable framework, enhancing single-trait and multi-trait PRSs by integrating EHR embeddings.
大规模生物银行提供全面的电子健康记录(EHRs),捕获详细的临床表型,潜在地增强疾病预测。然而,传统的多基因风险评分(PRS)方法依赖于简化的表型定义或预定义的性状集,限制了它们表示电子病历中嵌入的复杂结构的能力。为了解决这一差距,我们引入了ehr嵌入增强PRS (EEPRS),利用来自ehr的表型嵌入来改进PRS,仅使用全基因组关联研究(GWAS)汇总统计。采用Word2Vec和GPT等嵌入方法,进行了基于ehr嵌入的GWASs,并通过遗传相关性的分层聚类确定了心血管类。在英国生物银行的41个性状中,EEPRS的表现一直优于单性状prs,特别是在这个集群中。基于PRS的全现象关联研究进一步表明,基于ehr嵌入的PRS与循环系统疾病之间存在强大的关联。然后,我们开发了EEPRS_optimal,这是一种数据自适应方法,使用交叉验证来选择最佳嵌入,从而产生额外的改进。我们还开发了mtag_epprs,与单性状PRSs和MTAG_PRS相比,进一步提高了多性状PRSs的预测精度。最后,我们在All of Us队列中验证了epprs对7种选定疾病的益处。总体而言,eprs代表了一个强大且可解释的框架,通过集成EHR嵌入来增强单性状和多性状prs。
{"title":"Improving polygenic risk prediction performance by integrating electronic health records through phenotype embedding.","authors":"Leqi Xu,Wangjie Zheng,Jiaqi Hu,Yingxin Lin,Jia Zhao,Gefei Wang,Tianyu Liu,Hongyu Zhao","doi":"10.1016/j.ajhg.2025.11.006","DOIUrl":"https://doi.org/10.1016/j.ajhg.2025.11.006","url":null,"abstract":"Large-scale biobanks provide comprehensive electronic health records (EHRs) that capture detailed clinical phenotypes, potentially enhancing disease prediction. However, traditional polygenic risk score (PRS) methods rely on simplified phenotype definitions or predefined trait sets, limiting their ability to represent the complex structures embedded within EHRs. To address this gap, we introduce EHR-embedding-enhanced PRS (EEPRS), leveraging phenotype embeddings derived from EHRs to improve PRSs using only genome-wide association study (GWAS) summary statistics. Employing embedding methods such as Word2Vec and GPT, we conducted EHR-embedding-based GWASs and identified a cardiovascular cluster via hierarchical clustering of genetic correlations. Across 41 traits in the UK Biobank, EEPRS consistently outperformed single-trait PRSs, particularly within this cluster. PRS-based phenome-wide association studies further demonstrated robust associations between EHR-embedding-based PRS and circulatory system diseases. We then developed EEPRS_optimal, a data-adaptive method that uses cross-validation to select the best embedding, yielding additional improvements. We also developed MTAG_EEPRS for multi-trait PRSs, which further improved prediction accuracy compared to single-trait PRSs and MTAG_PRS. Finally, we validated the benefits of EEPRS in the All of Us cohort for seven selected diseases. Overall, EEPRS represents a robust and interpretable framework, enhancing single-trait and multi-trait PRSs by integrating EHR embeddings.","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":"1 1","pages":"3030-3045"},"PeriodicalIF":9.8,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145680628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-04DOI: 10.1016/j.ajhg.2025.11.004
Yasuhiro Kosaka,Brandon Lopez,Nina Kishimoto,Shancy Jacob,Emilie Montenont,Rodrigo Huallanca,Graeson Coughenour,Jorge Di Paola,Justyne Ross,Kristy Lee,Matthew T Rondina,Paul F Bray,Jesse W Rowley
The interpretation of genetic variants in inherited diseases, such as inherited platelet disorders (IPDs), remains a major clinical challenge, as most are classified as variants of uncertain significance (VUSs). A key barrier to functional evaluation is the lack of accessible, lineage-appropriate assays that reliably reflect native gene regulation and cell-specific biology. To address this gap, we developed CRIMSON HD (CRISPR-edited megakaryocytes [MKs] for surveying platelet variant functions through homology-directed repair [HDR]), a CRISPR-Cas9 HDR-based genome-editing platform applicable to CD34+ cell-derived blood lineages and optimized for evaluating platelet-associated variants. Using this system, we modeled known and candidate disease-associated variants in integrin alpha 2b (ITGA2B) and integrin beta 3 (ITGB3), which encode the platelet αIIb/β3 integrin and are causative in Glanzmann thrombasthenia (GT). We introduced precise variants into primary human MKs derived from CD34+ hematopoietic stem and progenitor cells, achieving >90% editing efficiency. Edited MKs faithfully recapitulated both expression and functional phenotypes of known type I, II, and III GT variants. CRIMSON HD enabled functional evaluation and reclassification of several GT VUSs, including αIIb Gly201Ala, a population variant now shown to cause near-complete loss of αIIb/β3 expression; αIIb Ala777Asp, which results in intermediate αIIb/β3 expression and impaired agonist-induced integrin binding; and β3 Arg119Gln, previously linked to the loss of anti-HPA1a antibody binding in fetal and neonatal alloimmune thrombocytopenia (FNAIT), now shown to impair integrin surface expression. These findings demonstrate the importance of lineage-specific, physiologically relevant assays for the functional classification of platelet-related variants, providing mechanistic information and clinically meaningful insights for individuals with IPDs.
{"title":"Functional classification of platelet gene variants using CRISPR HDR in CD34+ cell-derived megakaryocytes.","authors":"Yasuhiro Kosaka,Brandon Lopez,Nina Kishimoto,Shancy Jacob,Emilie Montenont,Rodrigo Huallanca,Graeson Coughenour,Jorge Di Paola,Justyne Ross,Kristy Lee,Matthew T Rondina,Paul F Bray,Jesse W Rowley","doi":"10.1016/j.ajhg.2025.11.004","DOIUrl":"https://doi.org/10.1016/j.ajhg.2025.11.004","url":null,"abstract":"The interpretation of genetic variants in inherited diseases, such as inherited platelet disorders (IPDs), remains a major clinical challenge, as most are classified as variants of uncertain significance (VUSs). A key barrier to functional evaluation is the lack of accessible, lineage-appropriate assays that reliably reflect native gene regulation and cell-specific biology. To address this gap, we developed CRIMSON HD (CRISPR-edited megakaryocytes [MKs] for surveying platelet variant functions through homology-directed repair [HDR]), a CRISPR-Cas9 HDR-based genome-editing platform applicable to CD34+ cell-derived blood lineages and optimized for evaluating platelet-associated variants. Using this system, we modeled known and candidate disease-associated variants in integrin alpha 2b (ITGA2B) and integrin beta 3 (ITGB3), which encode the platelet αIIb/β3 integrin and are causative in Glanzmann thrombasthenia (GT). We introduced precise variants into primary human MKs derived from CD34+ hematopoietic stem and progenitor cells, achieving >90% editing efficiency. Edited MKs faithfully recapitulated both expression and functional phenotypes of known type I, II, and III GT variants. CRIMSON HD enabled functional evaluation and reclassification of several GT VUSs, including αIIb Gly201Ala, a population variant now shown to cause near-complete loss of αIIb/β3 expression; αIIb Ala777Asp, which results in intermediate αIIb/β3 expression and impaired agonist-induced integrin binding; and β3 Arg119Gln, previously linked to the loss of anti-HPA1a antibody binding in fetal and neonatal alloimmune thrombocytopenia (FNAIT), now shown to impair integrin surface expression. These findings demonstrate the importance of lineage-specific, physiologically relevant assays for the functional classification of platelet-related variants, providing mechanistic information and clinically meaningful insights for individuals with IPDs.","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":"5 1","pages":"2888-2901"},"PeriodicalIF":9.8,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145680629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}