Pub Date : 2024-09-18DOI: 10.1101/2024.09.17.24313718
Satoshi Koyama, Zhi Yu, Seung Hoan Choi, Sean J. Jurgens, Margaret Sunitha Selvaraj, Derek Klarin, Jennifer E. Huffman, Shoa L. Clarke, Michael N. Trinh, Akshaya Ravi, Jacqueline S. Dron, Catherine Spinks, Ida Surakka, Aarushi Bhatnagar, Kim Lannery, Whitney Hornsby, Scott M. Damrauer, Kyong-Mi Chang, Julie A. Lynch, Themistocles L. Assimes, Philip S. Tsao, Daniel J. Rader, Kelly Cho, Gina M. Peloso, Patrick T. Ellinor, Yan V. Sun, Peter WF. Wilson, The Million Veteran Program, Pradeep Natarajan
Rare coding alleles play crucial roles in the molecular diagnosis of genetic diseases. However, the systemic identification of these alleles has been challenging due to their scarcity in the general population. Here, we discovered and characterized rare coding alleles contributing to genetic dyslipidemia, a principal risk for coronary artery disease, among over a million individuals combining three large contemporary genetic datasets (Million Veteran Program, n = 634,535, UK Biobank, n = 431,178, and All Of Us Research Program, n = 92,304) totaling 1,158,017 multi-ancestral individuals. Unlike previous rare variant studies in lipids, this study included 238,243 individuals (20.6%) from non-European-like populations. Testing 2,997,401 rare coding variants from diverse backgrounds, we identified 800 exome-wide significant associations across 209 genes including 176 predicted loss of function and 624 missense variants. Among these exome-wide associations, 130 associations were driven by non-European-like populations. Associated alleles are highly enriched in functional variant classes, showed significant additive and recessive associations, exhibited similar effects across populations, and resolved pathogenicity for variants enriched in African or South-Asian populations. Furthermore, we identified 5 lipid-related genes associated with coronary artery disease (RORC, CFAP65, GTF2E2, PLCB3, and ZNF117). Among them, RORC is a potentially novel therapeutic target through the down regulation of LDLC by its silencing. This study provides resources and insights for understanding causal mechanisms, quantifying the expressivity of rare coding alleles, and identifying novel drug targets across diverse populations.
{"title":"Exome wide association study for blood lipids in 1,158,017 individuals from diverse populations","authors":"Satoshi Koyama, Zhi Yu, Seung Hoan Choi, Sean J. Jurgens, Margaret Sunitha Selvaraj, Derek Klarin, Jennifer E. Huffman, Shoa L. Clarke, Michael N. Trinh, Akshaya Ravi, Jacqueline S. Dron, Catherine Spinks, Ida Surakka, Aarushi Bhatnagar, Kim Lannery, Whitney Hornsby, Scott M. Damrauer, Kyong-Mi Chang, Julie A. Lynch, Themistocles L. Assimes, Philip S. Tsao, Daniel J. Rader, Kelly Cho, Gina M. Peloso, Patrick T. Ellinor, Yan V. Sun, Peter WF. Wilson, The Million Veteran Program, Pradeep Natarajan","doi":"10.1101/2024.09.17.24313718","DOIUrl":"https://doi.org/10.1101/2024.09.17.24313718","url":null,"abstract":"Rare coding alleles play crucial roles in the molecular diagnosis of genetic diseases. However, the systemic identification of these alleles has been challenging due to their scarcity in the general population. Here, we discovered and characterized rare coding alleles contributing to genetic dyslipidemia, a principal risk for coronary artery disease, among over a million individuals combining three large contemporary genetic datasets (Million Veteran Program, n = 634,535, UK Biobank, n = 431,178, and All Of Us Research Program, n = 92,304) totaling 1,158,017 multi-ancestral individuals. Unlike previous rare variant studies in lipids, this study included 238,243 individuals (20.6%) from non-European-like populations.\u0000Testing 2,997,401 rare coding variants from diverse backgrounds, we identified 800 exome-wide significant associations across 209 genes including 176 predicted loss of function and 624 missense variants. Among these exome-wide associations, 130 associations were driven by non-European-like populations. Associated alleles are highly enriched in functional variant classes, showed significant additive and recessive associations, exhibited similar effects across populations, and resolved pathogenicity for variants enriched in African or South-Asian populations. Furthermore, we identified 5 lipid-related genes associated with coronary artery disease (RORC, CFAP65, GTF2E2, PLCB3, and ZNF117). Among them, RORC is a potentially novel therapeutic target through the down regulation of LDLC by its silencing.\u0000This study provides resources and insights for understanding causal mechanisms, quantifying the expressivity of rare coding alleles, and identifying novel drug targets across diverse populations.","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-18DOI: 10.1101/2024.09.17.24313555
Delong Liu, Carolyn Beth Mervis, Mark Levin, Elisa Biamino, Maria Francesca Bedeschi, Maria Cristina Digilio, Gabriella Maria Squeo, Roberta Villa, Neelam Raja, Joy Lynne Freeman, Sharon Osgood, Giuseppe Merla, Amy Roberts, Colleen Morris, Lucy R Osborne, Beth Kozel
In a previous pathway-based, extreme phenotype study, we identified 1064 variants associated with supravalvar aortic stenosis (SVAS) severity in people with Williams syndrome (WS) and either no SVAS or surgical SVAS. Here, we use those variants to develop and test polygenic risk scores (PRS). We used the clumping and thresholding (CT) approach on the full 1064 variants and a 427-variant subset that was part of 13 biologically relevant pathways identified in the previous study. We also used a lasso approach on the full set. We were able to achieve an area under the curve (AUC) of >0.99 for the two CT PRS methods, using only 622 and 320 variants respectively when 2/3 of the initial 217 participants data were used for training and 1/3 for testing. The lasso performed less well. We then evaluated the performance of those PRS variant sets on an additional group of 138 patients with WS with intermediate severity SVAS and found a misclassification rate of <10% between the surgical and intermediate groups, suggesting potential for clinical utility of the score.
{"title":"Identifying individuals at risk for surgical supravalvar aortic stenosis by polygenic risk score with graded phenotyping","authors":"Delong Liu, Carolyn Beth Mervis, Mark Levin, Elisa Biamino, Maria Francesca Bedeschi, Maria Cristina Digilio, Gabriella Maria Squeo, Roberta Villa, Neelam Raja, Joy Lynne Freeman, Sharon Osgood, Giuseppe Merla, Amy Roberts, Colleen Morris, Lucy R Osborne, Beth Kozel","doi":"10.1101/2024.09.17.24313555","DOIUrl":"https://doi.org/10.1101/2024.09.17.24313555","url":null,"abstract":"In a previous pathway-based, extreme phenotype study, we identified 1064 variants associated with supravalvar aortic stenosis (SVAS) severity in people with Williams syndrome (WS) and either no SVAS or surgical SVAS. Here, we use those variants to develop and test polygenic risk scores (PRS). We used the clumping and thresholding (CT) approach on the full 1064 variants and a 427-variant subset that was part of 13 biologically relevant pathways identified in the previous study. We also used a lasso approach on the full set. We were able to achieve an area under the curve (AUC) of >0.99 for the two CT PRS methods, using only 622 and 320 variants respectively when 2/3 of the initial 217 participants data were used for training and 1/3 for testing. The lasso performed less well. We then evaluated the performance of those PRS variant sets on an additional group of 138 patients with WS with intermediate severity SVAS and found a misclassification rate of <10% between the surgical and intermediate groups, suggesting potential for clinical utility of the score.","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142257114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-16DOI: 10.1101/2024.09.13.24313654
Marielle L Bond, I Yoseli Quiroga-Barber, Susan D'Costa, Yijia Wu, Jessica Bell, Jessica McAfee, Nicole Kramer, Sool Lee, Mary Patrucco, Douglas Phanstiel, Hyejung Won
Genome-wide association studies have identified loci associated with Alzheimers Disease (AD), but identifying the exact causal variants and genes at each locus is challenging due to linkage disequilibrium and their largely non-coding nature. To address this, we performed a massively parallel reporter assay of 3,576 AD-associated variants in THP-1 macrophages in both resting and proinflammatory states and identified 47 expression-modulating variants (emVars). To understand the endogenous chromatin context of emVars, we built an activity-by-contact model using epigenomic maps of macrophage inflammation and inferred condition-specific enhancer-promoter pairs. Intersection of emVars with enhancer-promoter pairs and microglia expression quantitative trait loci allowed us to connect 39 emVars to 76 putative AD risk genes enriched for AD-associated molecular signatures. Overall, systematic characterization of AD-associated variants enhances our understanding of the regulatory mechanisms underlying AD pathogenesis.
全基因组关联研究已经确定了与阿尔茨海默病(AD)相关的基因位点,但由于连锁不平衡及其大部分非编码的性质,确定每个基因位点上的确切致病变体和基因具有挑战性。为了解决这个问题,我们对静息和促炎状态下 THP-1 巨噬细胞中的 3,576 个 AD 相关变体进行了大规模并行报告分析,并鉴定出 47 个表达调节变体(emVars)。为了了解 emVars 的内源性染色质背景,我们利用巨噬细胞炎症的表观基因组图谱建立了一个活动-接触模型,并推断出了特定条件下的增强子-启动子对。emVars与增强子-启动子对和小胶质细胞表达定量性状位点的交叉使我们能够将39个emVars与76个富含AD相关分子特征的假定AD风险基因联系起来。总之,对 AD 相关变异的系统表征加深了我们对 AD 发病机制的调控机制的理解。
{"title":"Deciphering the functional impact of Alzheimers Disease-associated variants in resting and proinflammatory immune cells","authors":"Marielle L Bond, I Yoseli Quiroga-Barber, Susan D'Costa, Yijia Wu, Jessica Bell, Jessica McAfee, Nicole Kramer, Sool Lee, Mary Patrucco, Douglas Phanstiel, Hyejung Won","doi":"10.1101/2024.09.13.24313654","DOIUrl":"https://doi.org/10.1101/2024.09.13.24313654","url":null,"abstract":"Genome-wide association studies have identified loci associated with Alzheimers Disease (AD), but identifying the exact causal variants and genes at each locus is challenging due to linkage disequilibrium and their largely non-coding nature. To address this, we performed a massively parallel reporter assay of 3,576 AD-associated variants in THP-1 macrophages in both resting and proinflammatory states and identified 47 expression-modulating variants (emVars). To understand the endogenous chromatin context of emVars, we built an activity-by-contact model using epigenomic maps of macrophage inflammation and inferred condition-specific enhancer-promoter pairs. Intersection of emVars with enhancer-promoter pairs and microglia expression quantitative trait loci allowed us to connect 39 emVars to 76 putative AD risk genes enriched for AD-associated molecular signatures. Overall, systematic characterization of AD-associated variants enhances our understanding of the regulatory mechanisms underlying AD pathogenesis.","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"124 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-16DOI: 10.1101/2024.09.13.24313644
Emma N Somerville, Alva James, Christian Beatz, Robert Schwieger, Gal Barrel, Krishna K Kandaswamy, Marius I Iurascu, Peter Bauer, Michael Ta, Hirotaka Iwaki, Konstantin Senkevich, Eric Yu, Roy N Alcalay, Ziv Gan-Or
GBA1 variants and decreased glucocerebrosidase (GCase) activity are implicated in Parkinson's disease (PD). We investigated the hypothesis that increased levels of glucosylceramide (GlcCer), one of GCase main substrates, are involved in PD pathogenesis. Using multiple genetic methods, we show that ATP10D, not GBA1, is the main regulator of plasma GlcCer levels, yet it is not involved in PD pathogenesis. Plasma GlcCer levels were associated with PD, but not in a causative manner, and are not predictive of disease status. These results argue against targeting GlcCer in GBA1-PD and underscore the need to explore alternative mechanisms and biomarkers for PD.
{"title":"Plasma glucosylceramide levels are regulated by ATP10D and are not involved in Parkinson's disease pathogenesis.","authors":"Emma N Somerville, Alva James, Christian Beatz, Robert Schwieger, Gal Barrel, Krishna K Kandaswamy, Marius I Iurascu, Peter Bauer, Michael Ta, Hirotaka Iwaki, Konstantin Senkevich, Eric Yu, Roy N Alcalay, Ziv Gan-Or","doi":"10.1101/2024.09.13.24313644","DOIUrl":"https://doi.org/10.1101/2024.09.13.24313644","url":null,"abstract":"GBA1 variants and decreased glucocerebrosidase (GCase) activity are implicated in Parkinson's disease (PD). We investigated the hypothesis that increased levels of glucosylceramide (GlcCer), one of GCase main substrates, are involved in PD pathogenesis. Using multiple genetic methods, we show that ATP10D, not GBA1, is the main regulator of plasma GlcCer levels, yet it is not involved in PD pathogenesis. Plasma GlcCer levels were associated with PD, but not in a causative manner, and are not predictive of disease status. These results argue against targeting GlcCer in GBA1-PD and underscore the need to explore alternative mechanisms and biomarkers for PD.","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"203 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142257018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-16DOI: 10.1101/2024.09.16.24313748
Katelyn E Connelly, Katherine Hullin, Ehssan Abdolalizadeh, Jun Zhong, Daina Eiser, Aidan O'Brien, Irene Collins, Sudipto Das, Gerard Duncan, Pancreatic Cancer Cohort Consortium, Pancreatic Cancer Case-Control Consortium, Stephen Chanock, Rachael Z Stolzenberg-Solomon, Alison Klein, Brian M Wolpin, Jason W Hoskins, Thorkell Andresson, Jill P Smith, Laufey T Amundadottir
Pancreatic Ductal Adenocarcinoma (PDAC) is the third leading cause of cancer-related deaths in the U.S. Both rare and common germline variants contribute to PDAC risk. Here, we fine-map and functionally characterize a common PDAC risk signal at 1p36.33 (tagged by rs13303010) identified through a genome wide association study (GWAS). One of the fine-mapped SNPs, rs13303160 (r2=0.93 in 1000G EUR samples, OR=1.23, P value=2.74x10-9) demonstrated allele-preferential gene regulatory activity in vitro and allele-preferential binding of JunB and JunD in vitro and in vivo. Expression Quantitative Trait Locus (eQTL) analysis identified KLHL17 as a likely target gene underlying the signal. Proteomic analysis identified KLHL17 as a member of the Cullin-E3 ubiquitin ligase complex in PDAC-derived cells. In silico differential gene expression analysis of the GTExv8 pancreas data suggested an association between lower KLHL17 (risk associated) and pro-inflammatory pathways. We hypothesize that KLHL17 may mitigate inflammation by recruiting pro-inflammatory proteins for ubiquitination and degradation thereby influencing PDAC risk.
{"title":"Allelic effects on KLHL17 expression likely mediated by JunB/D underlie a PDAC GWAS signal at chr1p36.33","authors":"Katelyn E Connelly, Katherine Hullin, Ehssan Abdolalizadeh, Jun Zhong, Daina Eiser, Aidan O'Brien, Irene Collins, Sudipto Das, Gerard Duncan, Pancreatic Cancer Cohort Consortium, Pancreatic Cancer Case-Control Consortium, Stephen Chanock, Rachael Z Stolzenberg-Solomon, Alison Klein, Brian M Wolpin, Jason W Hoskins, Thorkell Andresson, Jill P Smith, Laufey T Amundadottir","doi":"10.1101/2024.09.16.24313748","DOIUrl":"https://doi.org/10.1101/2024.09.16.24313748","url":null,"abstract":"Pancreatic Ductal Adenocarcinoma (PDAC) is the third leading cause of cancer-related deaths in the U.S. Both rare and common germline variants contribute to PDAC risk. Here, we fine-map and functionally characterize a common PDAC risk signal at 1p36.33 (tagged by rs13303010) identified through a genome wide association study (GWAS). One of the fine-mapped SNPs, rs13303160 (r<sup>2</sup>=0.93 in 1000G EUR samples, OR=1.23, P value=2.74x10<sup>-9</sup>) demonstrated allele-preferential gene regulatory activity <em>in vitro</em> and allele-preferential binding of JunB and JunD <em>in vitro</em> and <em>in vivo</em>. Expression Quantitative Trait Locus (eQTL) analysis identified <em>KLHL17</em> as a likely target gene underlying the signal. Proteomic analysis identified KLHL17 as a member of the Cullin-E3 ubiquitin ligase complex in PDAC-derived cells. <em>In silico</em> differential gene expression analysis of the GTExv8 pancreas data suggested an association between lower <em>KLHL17</em> (risk associated) and pro-inflammatory pathways. We hypothesize that KLHL17 may mitigate inflammation by recruiting pro-inflammatory proteins for ubiquitination and degradation thereby influencing PDAC risk.","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-16DOI: 10.1101/2024.09.16.24313728
Opeyemi Soremekun, Young-chan Park, Mauro Tutino, Allan Kalungi, Moffat J. Nyirenda, Segun Fatumo, Eleftheria Zeggini
Individuals of African ancestry remain largely underrepresented in genetic and proteomic studies. Here, we measure the levels of 2,873 proteins using the Olink proximity extension assay in plasma samples from 163 individuals with type 2 diabetes (T2D) or prediabetes and 362 normoglycemic controls from the Ugandan population for the first time. We identify 88 differentially expressed proteins between the two groups and 208 proteins associated with cardiometabolic traits. We link genome-wide data to protein expression levels and construct the first protein quantitative trait locus (pQTL) map in this population. We identify 399 independent associations with 346 (86.7%) cis-pQTLs and 53 (13.3%) trans-pQTLs. 16.7% of the cis-pQTLs and all of the trans-pQTLs have not been previously reported in African-ancestry individuals. Of these, 37 pQTLs have not been previously reported in any population. We find evidence for colocalization between a pQTL for SIRPA and T2D genetic risk. Mendelian randomization analysis identified 20 proteins causally associated with T2D. Our findings reveal proteins causally implicated in the pathogenesis of T2D, which may be leveraged for personalized medicine tailored to African-ancestry individuals.
{"title":"Linking the plasma proteome to genetics in individuals from continental Africa provides insights into type 2 diabetes pathogenesis","authors":"Opeyemi Soremekun, Young-chan Park, Mauro Tutino, Allan Kalungi, Moffat J. Nyirenda, Segun Fatumo, Eleftheria Zeggini","doi":"10.1101/2024.09.16.24313728","DOIUrl":"https://doi.org/10.1101/2024.09.16.24313728","url":null,"abstract":"Individuals of African ancestry remain largely underrepresented in genetic and proteomic studies. Here, we measure the levels of 2,873 proteins using the Olink proximity extension assay in plasma samples from 163 individuals with type 2 diabetes (T2D) or prediabetes and 362 normoglycemic controls from the Ugandan population for the first time. We identify 88 differentially expressed proteins between the two groups and 208 proteins associated with cardiometabolic traits. We link genome-wide data to protein expression levels and construct the first protein quantitative trait locus (pQTL) map in this population. We identify 399 independent associations with 346 (86.7%) cis-pQTLs and 53 (13.3%) trans-pQTLs. 16.7% of the cis-pQTLs and all of the trans-pQTLs have not been previously reported in African-ancestry individuals. Of these, 37 pQTLs have not been previously reported in any population. We find evidence for colocalization between a pQTL for SIRPA and T2D genetic risk. Mendelian randomization analysis identified 20 proteins causally associated with T2D. Our findings reveal proteins causally implicated in the pathogenesis of T2D, which may be leveraged for personalized medicine tailored to African-ancestry individuals.","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Knee pain is a widespread musculoskeletal condition affecting millions globally, with significant socio-economic implications. This study endeavors to identify genetic variants associated with knee pain through a comprehensive genome-wide association study (GWAS) using data from 441,757 individuals in the UK Biobank. The primary GWAS identified ten significant loci, including eight novel loci, with the most significant single nucleotide polymorphism (SNP) being rs143384 near the GDF5 gene on chromosome 20 (p = 4.68 x 10-19). In the replication study, seven loci (rs143384, rs919642, rs55760279, rs56076919, rs3892354, rs687878, rs368636424) were found to be significant in the FinnGen cohort. Further, sex-specific analyses revealed distinct genetic associations, identifying three loci (rs143384 with p = 1.70x10-15, rs56076919 with p = 1.60x10-9, rs919642 with p = 1.45x10-8) in females and four loci ( rs2899611 with p = 2.77 x 10-11, rs891720 with p = 5.55 x 10-11, rs2742313 with p = 4.19 x 10-9, rs2019689 with p = 6.51 x 10-9) in males. The phenome-wide association analysis and Mendelian randomization analysis revealed significant links between several phenotypes and knee pain such as leg pain on walking. These findings enhance our understanding of the genetic factors of knee pain, offering potential pathways for therapeutic interventions and personalized medical strategies.
膝关节疼痛是一种广泛存在的肌肉骨骼疾病,影响着全球数百万人,并对社会经济产生重大影响。本研究利用英国生物库中 441,757 人的数据,通过全面的全基因组关联研究(GWAS),努力确定与膝关节疼痛相关的遗传变异。主要的全基因组关联研究确定了 10 个重要位点,包括 8 个新位点,其中最重要的单核苷酸多态性(SNP)是 20 号染色体 GDF5 基因附近的 rs143384(p = 4.68 x 10-19)。在复制研究中,发现芬兰基因队列中有七个位点(rs143384、rs919642、rs55760279、rs56076919、rs3892354、rs687878、rs368636424)具有显著性。此外,性别特异性分析显示了不同的遗传关联,在女性中发现了三个位点(rs143384,p = 1.70x10-15;rs56076919,p = 1.60x10-9;rs919642,p = 1.45x10-8),男性有四个位点(rs2899611,p = 2.77 x 10-11;rs891720,p = 5.55 x 10-11;rs2742313,p = 4.19 x 10-9;rs2019689,p = 6.51 x 10-9)。全表型关联分析和孟德尔随机分析表明,若干表型与膝关节疼痛(如行走时腿部疼痛)之间存在显著联系。这些发现加深了我们对膝关节疼痛遗传因素的了解,为治疗干预和个性化医疗策略提供了潜在途径。
{"title":"A Genome-wide Association Study Identifies Novel Genetic Variants Associated with Knee Pain in the UK Biobank (N = 441,757)","authors":"Yiwen Tao, Qi Pan, tengda cai, Luning Yang, mainul haque, tania dottorini, Weihua Meng","doi":"10.1101/2024.09.16.24313726","DOIUrl":"https://doi.org/10.1101/2024.09.16.24313726","url":null,"abstract":"Knee pain is a widespread musculoskeletal condition affecting millions globally, with significant socio-economic implications. This study endeavors to identify genetic variants associated with knee pain through a comprehensive genome-wide association study (GWAS) using data from 441,757 individuals in the UK Biobank. The primary GWAS identified ten significant loci, including eight novel loci, with the most significant single nucleotide polymorphism (SNP) being rs143384 near the <em>GDF5</em> gene on chromosome 20 (<em>p</em> = 4.68 x 10<sup>-19</sup>). In the replication study, seven loci (rs143384, rs919642, rs55760279, rs56076919, rs3892354, rs687878, rs368636424) were found to be significant in the FinnGen cohort. Further, sex-specific analyses revealed distinct genetic associations, identifying three loci (rs143384 with <em>p</em> = 1.70x10<sup>-15</sup>, rs56076919 with <em>p</em> = 1.60x10<sup>-9</sup>, rs919642 with <em>p</em> = 1.45x10<sup>-8</sup>) in females and four loci ( rs2899611 with <em>p</em> = 2.77 x 10<sup>-11</sup>, rs891720 with <em>p</em> = 5.55 x 10<sup>-11</sup>, rs2742313 with <em>p</em> = 4.19 x 10<sup>-9</sup>, rs2019689 with <em>p</em> = 6.51 x 10<sup>-9</sup>) in males. The phenome-wide association analysis and Mendelian randomization analysis revealed significant links between several phenotypes and knee pain such as leg pain on walking. These findings enhance our understanding of the genetic factors of knee pain, offering potential pathways for therapeutic interventions and personalized medical strategies.","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-16DOI: 10.1101/2024.09.15.24313695
Lin Shen, Yifang Yang, Lei Lu, Oscar Hou In Chou, Quinncy Lee, Tong Liu, Guoliang Li, Shuk Han Cheng, Gary Tse, Jiandong Zhou
Background: Epidemiological studies have linked the use of the anti-diabetic medications, sodium-glucose co-transporter-2 inhibitors (SGLT2I), dipeptidyl peptidase-4 inhibitors (DPP4I) and glucagon-like peptide-1 receptor agonists (GLP1RA), with prostate cancer risk. However, these studies cannot infer causality. Methods: This was a two-sample Mendelian randomization (MR) using genome-wide association study data designed to identify causal relationships between SGLT2I, DPP4I or GLP1RA and prostate cancer. Genetic associations with HbA1c and risk of prostate cancer were extracted from IEU Open-GWAS Project database with GWAS id ukb-d-30750_irnt (UK Biobank cohort) and ebi-a-GCST006085 (European Molecular Biology Laboratory's European Bioinformatics Institute cohort), respectively. The two GWAS datasets chosen were obtained from individuals of European ancestry to minimise potential bias from population stratification. The encoding genes targeted by SGLT2I, DPP4I and GLP1RA were SGC5A2, DPP4 and GLP1R, located in Chr16: 31494323-31502181, Chr2: 162848755-162930904 and Chr6: 39016557-39059079, respectively. Results: A total of 31, 2 and 5 single nucleotide variants (SNVs) were used for SGC5A2, DPP4 and GLP1R. Our MR analysis results supported a causal relationship between genetic variation in SLC5A2 and DPP4 and reduced risk of prostate cancer at the Bonferroni-corrected threshold, with odds ratios (OR) [95% confidence intervals] of 0.47 [0.38-0.58] and 0.35 [0.24-0.53], but not for GLP1R (OR: 1.39 [0.93-2.07]). Sensitivity analyses by the leave-one-out method did not significantly alter the OR for SGLT2I. Conclusions: The two-sample MR analysis found that SGLT2 and DPP4 inhibition, but not GLP1R agonism, was associated with lower risks of developing prostate cancer.
{"title":"Genetic associations between SGLT2 inhibition, DPP4 inhibition or GLP1R agonism and prostate cancer risk: a two-sample Mendelian randomisation study","authors":"Lin Shen, Yifang Yang, Lei Lu, Oscar Hou In Chou, Quinncy Lee, Tong Liu, Guoliang Li, Shuk Han Cheng, Gary Tse, Jiandong Zhou","doi":"10.1101/2024.09.15.24313695","DOIUrl":"https://doi.org/10.1101/2024.09.15.24313695","url":null,"abstract":"Background: Epidemiological studies have linked the use of the anti-diabetic medications, sodium-glucose co-transporter-2 inhibitors (SGLT2I), dipeptidyl peptidase-4 inhibitors (DPP4I) and glucagon-like peptide-1 receptor agonists (GLP1RA), with prostate cancer risk. However, these studies cannot infer causality. Methods: This was a two-sample Mendelian randomization (MR) using genome-wide association study data designed to identify causal relationships between SGLT2I, DPP4I or GLP1RA and prostate cancer. Genetic associations with HbA1c and risk of prostate cancer were extracted from IEU Open-GWAS Project database with GWAS id ukb-d-30750_irnt (UK Biobank cohort) and ebi-a-GCST006085 (European Molecular Biology Laboratory's European Bioinformatics Institute cohort), respectively. The two GWAS datasets chosen were obtained from individuals of European ancestry to minimise potential bias from population stratification. The encoding genes targeted by SGLT2I, DPP4I and GLP1RA were SGC5A2, DPP4 and GLP1R, located in Chr16: 31494323-31502181, Chr2: 162848755-162930904 and Chr6: 39016557-39059079, respectively. Results: A total of 31, 2 and 5 single nucleotide variants (SNVs) were used for SGC5A2, DPP4 and GLP1R. Our MR analysis results supported a causal relationship between genetic variation in SLC5A2 and DPP4 and reduced risk of prostate cancer at the Bonferroni-corrected threshold, with odds ratios (OR) [95% confidence intervals] of 0.47 [0.38-0.58] and 0.35 [0.24-0.53], but not for GLP1R (OR: 1.39 [0.93-2.07]). Sensitivity analyses by the leave-one-out method did not significantly alter the OR for SGLT2I.\u0000Conclusions: The two-sample MR analysis found that SGLT2 and DPP4 inhibition, but not GLP1R agonism, was associated with lower risks of developing prostate cancer.","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-15DOI: 10.1101/2024.09.13.24313501
Trisha P. Gupte, Zahra Azizi, Pik Fang Kho, Jiayan Zhou, Kevin Nzenkue, Ming-Li Chen, Daniel J. Panyard, Rodrigo Guarischi-Sousa, Austin T. Hilliard, Disha Sharma, Kathleen Watson, Fahim Abbasi, Philip S. Tsao, Shoa L. Clarke, Themistocles L. Assimes
Aims/hypothesis: The plasma proteome holds promise as a diagnostic and prognostic tool that can accurately reflect complex human traits and disease processes. We assessed the ability of plasma proteins to predict type 2 diabetes mellitus (T2DM) and related traits. Methods: Clinical, genetic, and high-throughput proteomic data from three subcohorts of UK Biobank participants were analyzed for association with dual-energy x-ray absorptiometry (DXA) derived truncal fat (in the adiposity subcohort), estimated maximum oxygen consumption (VO2max) (in the fitness subcohort), and incident T2DM (in the T2DM subcohort). We used least absolute shrinkage and selection operator (LASSO) regression to assess the relative ability of non-proteomic and proteomic variables to associate with each trait by comparing variance explained (R2) and area under the curve (AUC) statistics between data types. Stability selection with randomized LASSO regression identified the most robustly associated proteins for each trait. The benefit of proteomic signatures (PSs) over QDiabetes, a T2DM clinical risk score, was evaluated through the derivation of delta (∆) AUC values. We also assessed the incremental gain in model performance metrics using proteomic datasets with varying numbers of proteins. A series of two-sample Mendelian randomization (MR) analyses were conducted to identify potentially causal proteins for adiposity, fitness, and T2DM. Results: Across all three subcohorts, the mean age was 56.7 years and 54.9% were female. In the T2DM subcohort, 5.8% developed incident T2DM over a median follow-up of 7.6 years. LASSO-derived PSs increased the R2 of truncal fat and VO2max over clinical and genetic factors by 0.074 and 0.057, respectively. We observed a similar improvement in T2DM prediction over the QDiabetes score [Δ AUC: 0.016 (95% CI 0.008, 0.024)] when using a robust PS derived strictly from the T2DM outcome versus a model further augmented with non-overlapping proteins associated with adiposity and fitness. A small number of proteins (29 for truncal adiposity, 18 for VO2max, and 26 for T2DM) identified by stability selection algorithms offered most of the improvement in prediction of each outcome. Filtered and clustered versions of the full proteomic dataset supplied by the UK Biobank (ranging between 600-1,500 proteins) performed comparably to the full dataset for T2DM prediction. Using MR, we identified 4 proteins as potentially causal for adiposity, 1 as potentially causal for fitness, and 4 as potentially causal for T2DM. Conclusions/Interpretation: Plasma PSs modestly improve the prediction of incident T2DM over that possible with clinical and genetic factors. Further studies are warranted to better elucidate the clinical utility of these signatures in predicting the risk of T2DM over the standard practice of using the QDiabetes score. Candidate causally associated proteins identified through MR deserve further study as potential novel therapeutic targets for T2
{"title":"Plasma proteomic signatures for type 2 diabetes mellitus and related traits in the UK Biobank cohort","authors":"Trisha P. Gupte, Zahra Azizi, Pik Fang Kho, Jiayan Zhou, Kevin Nzenkue, Ming-Li Chen, Daniel J. Panyard, Rodrigo Guarischi-Sousa, Austin T. Hilliard, Disha Sharma, Kathleen Watson, Fahim Abbasi, Philip S. Tsao, Shoa L. Clarke, Themistocles L. Assimes","doi":"10.1101/2024.09.13.24313501","DOIUrl":"https://doi.org/10.1101/2024.09.13.24313501","url":null,"abstract":"Aims/hypothesis: The plasma proteome holds promise as a diagnostic and prognostic tool that can accurately reflect complex human traits and disease processes. We assessed the ability of plasma proteins to predict type 2 diabetes mellitus (T2DM) and related traits. Methods: Clinical, genetic, and high-throughput proteomic data from three subcohorts of UK Biobank participants were analyzed for association with dual-energy x-ray absorptiometry (DXA) derived truncal fat (in the adiposity subcohort), estimated maximum oxygen consumption (VO2max) (in the fitness subcohort), and incident T2DM (in the T2DM subcohort). We used least absolute shrinkage and selection operator (LASSO) regression to assess the relative ability of non-proteomic and proteomic variables to associate with each trait by comparing variance explained (R2) and area under the curve (AUC) statistics between data types. Stability selection with randomized LASSO regression identified the most robustly associated proteins for each trait. The benefit of proteomic signatures (PSs) over QDiabetes, a T2DM clinical risk score, was evaluated through the derivation of delta (∆) AUC values. We also assessed the incremental gain in model performance metrics using proteomic datasets with varying numbers of proteins. A series of two-sample Mendelian randomization (MR) analyses were conducted to identify potentially causal proteins for adiposity, fitness, and T2DM. Results: Across all three subcohorts, the mean age was 56.7 years and 54.9% were female. In the T2DM subcohort, 5.8% developed incident T2DM over a median follow-up of 7.6 years. LASSO-derived PSs increased the R2 of truncal fat and VO2max over clinical and genetic factors by 0.074 and 0.057, respectively. We observed a similar improvement in T2DM prediction over the QDiabetes score [Δ AUC: 0.016 (95% CI 0.008, 0.024)] when using a robust PS derived strictly from the T2DM outcome versus a model further augmented with non-overlapping proteins associated with adiposity and fitness. A small number of proteins (29 for truncal adiposity, 18 for VO2max, and 26 for T2DM) identified by stability selection algorithms offered most of the improvement in prediction of each outcome. Filtered and clustered versions of the full proteomic dataset supplied by the UK Biobank (ranging between 600-1,500 proteins) performed comparably to the full dataset for T2DM prediction. Using MR, we identified 4 proteins as potentially causal for adiposity, 1 as potentially causal for fitness, and 4 as potentially causal for T2DM. Conclusions/Interpretation: Plasma PSs modestly improve the prediction of incident T2DM over that possible with clinical and genetic factors. Further studies are warranted to better elucidate the clinical utility of these signatures in predicting the risk of T2DM over the standard practice of using the QDiabetes score. Candidate causally associated proteins identified through MR deserve further study as potential novel therapeutic targets for T2","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142257019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-15DOI: 10.1101/2024.09.13.24313652
Trisha P. Gupte, Zahra Azizi, Pik Fang Kho, Jiayan Zhou, Ming-Li Chen, Daniel J. Panyard, Rodrigo Guarischi-Sousa, Austin T. Hilliard, Disha Sharma, Kathleen Watson, Fahim Abbasi, Shoa L. Clarke, Themistocles L. Assimes
Background: While risk stratification for atherosclerotic cardiovascular disease (ASCVD) is essential for primary prevention, current clinical risk algorithms demonstrate variability and leave room for further improvement. The plasma proteome holds promise as a future diagnostic and prognostic tool that can accurately reflect complex human traits and disease processes. We assessed the ability of plasma proteins to predict ASCVD. Method: Clinical, genetic, and high-throughput plasma proteomic data were analyzed for association with ASCVD in a cohort of 41,650 UK Biobank participants. Selected features for analysis included clinical variables such as a UK-based cardiovascular clinical risk score (QRISK3) and lipid levels, 36 polygenic risk scores (PRSs), and Olink protein expression data of 2,920 proteins. We used least absolute shrinkage and selection operator (LASSO) regression to select features and compared area under the curve (AUC) statistics between data types. Randomized LASSO regression with a stability selection algorithm identified a smaller set of more robustly associated proteins. The benefit of plasma proteins over standard clinical variables, the QRISK3 score, and PRSs was evaluated through the derivation of Δ AUC values. We also assessed the incremental gain in model performance using proteomic datasets with varying numbers of proteins. To identify potential causal proteins for ASCVD, we conducted a two-sample Mendelian randomization (MR) analysis. Result: The mean age of our cohort was 54.3 years, 53.3% were female, and 9.9% developed incident ASCVD over a median follow-up of 6.9 years. A protein-only LASSO model selected 294 proteins and returned an AUC of 0.723 (95% CI 0.708-0.737). A clinical variable and PRS-only LASSO model selected 4 clinical variables and 20 PRSs and achieved an AUC of 0.726 (95% CI 0.712-0.741). The addition of the full proteomic dataset to clinical variables and PRSs resulted in a Δ AUC of 0.010 (95% CI 0.003-0.018). Fifteen proteins selected by a stability selection algorithm offered improvement in ASCVD prediction over the QRISK3 risk score [Δ AUC: 0.013 (95% CI 0.005-0.021)]. Filtered and clustered versions of the full proteomic dataset (consisting of 600-1,500 proteins) performed comparably to the full dataset for ASCVD prediction. Using MR, we identified 12 proteins as potentially causal for ASCVD. Conclusion: A plasma proteomic signature performs well for incident ASCVD prediction but only modestly improves prediction over clinical and genetic factors. Further studies are warranted to better elucidate the clinical utility of this signature in predicting the risk of ASCVD over the standard practice of using the QRISK3 score.
{"title":"A plasma proteomic signature for atherosclerotic cardiovascular disease risk prediction in the UK Biobank cohort","authors":"Trisha P. Gupte, Zahra Azizi, Pik Fang Kho, Jiayan Zhou, Ming-Li Chen, Daniel J. Panyard, Rodrigo Guarischi-Sousa, Austin T. Hilliard, Disha Sharma, Kathleen Watson, Fahim Abbasi, Shoa L. Clarke, Themistocles L. Assimes","doi":"10.1101/2024.09.13.24313652","DOIUrl":"https://doi.org/10.1101/2024.09.13.24313652","url":null,"abstract":"Background: While risk stratification for atherosclerotic cardiovascular disease (ASCVD) is essential for primary prevention, current clinical risk algorithms demonstrate variability and leave room for further improvement. The plasma proteome holds promise as a future diagnostic and prognostic tool that can accurately reflect complex human traits and disease processes. We assessed the ability of plasma proteins to predict ASCVD. Method: Clinical, genetic, and high-throughput plasma proteomic data were analyzed for association with ASCVD in a cohort of 41,650 UK Biobank participants. Selected features for analysis included clinical variables such as a UK-based cardiovascular clinical risk score (QRISK3) and lipid levels, 36 polygenic risk scores (PRSs), and Olink protein expression data of 2,920 proteins. We used least absolute shrinkage and selection operator (LASSO) regression to select features and compared area under the curve (AUC) statistics between data types. Randomized LASSO regression with a stability selection algorithm identified a smaller set of more robustly associated proteins. The benefit of plasma proteins over standard clinical variables, the QRISK3 score, and PRSs was evaluated through the derivation of Δ AUC values. We also assessed the incremental gain in model performance using proteomic datasets with varying numbers of proteins. To identify potential causal proteins for ASCVD, we conducted a two-sample Mendelian randomization (MR) analysis. Result: The mean age of our cohort was 54.3 years, 53.3% were female, and 9.9% developed incident ASCVD over a median follow-up of 6.9 years. A protein-only LASSO model selected 294 proteins and returned an AUC of 0.723 (95% CI 0.708-0.737). A clinical variable and PRS-only LASSO model selected 4 clinical variables and 20 PRSs and achieved an AUC of 0.726 (95% CI 0.712-0.741). The addition of the full proteomic dataset to clinical variables and PRSs resulted in a Δ AUC of 0.010 (95% CI 0.003-0.018). Fifteen proteins selected by a stability selection algorithm offered improvement in ASCVD prediction over the QRISK3 risk score [Δ AUC: 0.013 (95% CI 0.005-0.021)]. Filtered and clustered versions of the full proteomic dataset (consisting of 600-1,500 proteins) performed comparably to the full dataset for ASCVD prediction. Using MR, we identified 12 proteins as potentially causal for ASCVD. Conclusion: A plasma proteomic signature performs well for incident ASCVD prediction but only modestly improves prediction over clinical and genetic factors. Further studies are warranted to better elucidate the clinical utility of this signature in predicting the risk of ASCVD over the standard practice of using the QRISK3 score.","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142257112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}