medRxiv - Genetic and Genomic Medicine最新文献

Exome wide association study for blood lipids in 1,158,017 individuals from diverse populations 对来自不同人群的 1,158,017 人进行血脂外显子组广泛关联研究

medRxiv - Genetic and Genomic Medicine

Pub Date : 2024-09-18 DOI: 10.1101/2024.09.17.24313718

Satoshi Koyama, Zhi Yu, Seung Hoan Choi, Sean J. Jurgens, Margaret Sunitha Selvaraj, Derek Klarin, Jennifer E. Huffman, Shoa L. Clarke, Michael N. Trinh, Akshaya Ravi, Jacqueline S. Dron, Catherine Spinks, Ida Surakka, Aarushi Bhatnagar, Kim Lannery, Whitney Hornsby, Scott M. Damrauer, Kyong-Mi Chang, Julie A. Lynch, Themistocles L. Assimes, Philip S. Tsao, Daniel J. Rader, Kelly Cho, Gina M. Peloso, Patrick T. Ellinor, Yan V. Sun, Peter WF. Wilson, The Million Veteran Program, Pradeep Natarajan

Rare coding alleles play crucial roles in the molecular diagnosis of genetic diseases. However, the systemic identification of these alleles has been challenging due to their scarcity in the general population. Here, we discovered and characterized rare coding alleles contributing to genetic dyslipidemia, a principal risk for coronary artery disease, among over a million individuals combining three large contemporary genetic datasets (Million Veteran Program, n = 634,535, UK Biobank, n = 431,178, and All Of Us Research Program, n = 92,304) totaling 1,158,017 multi-ancestral individuals. Unlike previous rare variant studies in lipids, this study included 238,243 individuals (20.6%) from non-European-like populations.Testing 2,997,401 rare coding variants from diverse backgrounds, we identified 800 exome-wide significant associations across 209 genes including 176 predicted loss of function and 624 missense variants. Among these exome-wide associations, 130 associations were driven by non-European-like populations. Associated alleles are highly enriched in functional variant classes, showed significant additive and recessive associations, exhibited similar effects across populations, and resolved pathogenicity for variants enriched in African or South-Asian populations. Furthermore, we identified 5 lipid-related genes associated with coronary artery disease (RORC, CFAP65, GTF2E2, PLCB3, and ZNF117). Among them, RORC is a potentially novel therapeutic target through the down regulation of LDLC by its silencing.This study provides resources and insights for understanding causal mechanisms, quantifying the expressivity of rare coding alleles, and identifying novel drug targets across diverse populations.

稀有编码等位基因在遗传病的分子诊断中起着至关重要的作用。然而，由于这些等位基因在普通人群中非常罕见，因此系统鉴定这些等位基因具有挑战性。在这里，我们结合三个大型当代遗传数据集（百万退伍军人计划，n = 634,535；英国生物库，n = 431,178；我们所有人研究计划，n = 92,304），在超过一百万人中发现了导致遗传性血脂异常（冠心病的主要风险）的罕见编码等位基因，总人数达 1,158,017 人。与以往的血脂罕见变异研究不同，本研究纳入了238,243名（20.6%）来自非欧洲类人群的个体。通过检测来自不同背景的2,997,401个罕见编码变异，我们在209个基因中发现了800个全外显子显著关联，包括176个预测功能缺失变异和624个错义变异。在这些全外显子关联中，有 130 个关联是由非欧洲裔人群驱动的。相关等位基因高度富集于功能变异类中，显示出显著的加性和隐性关联，在不同人群中表现出相似的效应，并解决了富集于非洲或南亚人群中的变异的致病性问题。此外，我们还发现了 5 个与冠状动脉疾病相关的脂质相关基因（RORC、CFAP65、GTF2E2、PLCB3 和 ZNF117）。这项研究为了解成因机制、量化罕见编码等位基因的表达性以及在不同人群中识别新型药物靶点提供了资源和见解。

{"title":"Exome wide association study for blood lipids in 1,158,017 individuals from diverse populations","authors":"Satoshi Koyama, Zhi Yu, Seung Hoan Choi, Sean J. Jurgens, Margaret Sunitha Selvaraj, Derek Klarin, Jennifer E. Huffman, Shoa L. Clarke, Michael N. Trinh, Akshaya Ravi, Jacqueline S. Dron, Catherine Spinks, Ida Surakka, Aarushi Bhatnagar, Kim Lannery, Whitney Hornsby, Scott M. Damrauer, Kyong-Mi Chang, Julie A. Lynch, Themistocles L. Assimes, Philip S. Tsao, Daniel J. Rader, Kelly Cho, Gina M. Peloso, Patrick T. Ellinor, Yan V. Sun, Peter WF. Wilson, The Million Veteran Program, Pradeep Natarajan","doi":"10.1101/2024.09.17.24313718","DOIUrl":"https://doi.org/10.1101/2024.09.17.24313718","url":null,"abstract":"Rare coding alleles play crucial roles in the molecular diagnosis of genetic diseases. However, the systemic identification of these alleles has been challenging due to their scarcity in the general population. Here, we discovered and characterized rare coding alleles contributing to genetic dyslipidemia, a principal risk for coronary artery disease, among over a million individuals combining three large contemporary genetic datasets (Million Veteran Program, n = 634,535, UK Biobank, n = 431,178, and All Of Us Research Program, n = 92,304) totaling 1,158,017 multi-ancestral individuals. Unlike previous rare variant studies in lipids, this study included 238,243 individuals (20.6%) from non-European-like populations.\u0000Testing 2,997,401 rare coding variants from diverse backgrounds, we identified 800 exome-wide significant associations across 209 genes including 176 predicted loss of function and 624 missense variants. Among these exome-wide associations, 130 associations were driven by non-European-like populations. Associated alleles are highly enriched in functional variant classes, showed significant additive and recessive associations, exhibited similar effects across populations, and resolved pathogenicity for variants enriched in African or South-Asian populations. Furthermore, we identified 5 lipid-related genes associated with coronary artery disease (RORC, CFAP65, GTF2E2, PLCB3, and ZNF117). Among them, RORC is a potentially novel therapeutic target through the down regulation of LDLC by its silencing.\u0000This study provides resources and insights for understanding causal mechanisms, quantifying the expressivity of rare coding alleles, and identifying novel drug targets across diverse populations.","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identifying individuals at risk for surgical supravalvar aortic stenosis by polygenic risk score with graded phenotyping 通过多基因风险评分和分级表型确定主动脉瓣上狭窄手术风险个体

medRxiv - Genetic and Genomic Medicine

Pub Date : 2024-09-18 DOI: 10.1101/2024.09.17.24313555

Delong Liu, Carolyn Beth Mervis, Mark Levin, Elisa Biamino, Maria Francesca Bedeschi, Maria Cristina Digilio, Gabriella Maria Squeo, Roberta Villa, Neelam Raja, Joy Lynne Freeman, Sharon Osgood, Giuseppe Merla, Amy Roberts, Colleen Morris, Lucy R Osborne, Beth Kozel

In a previous pathway-based, extreme phenotype study, we identified 1064 variants associated with supravalvar aortic stenosis (SVAS) severity in people with Williams syndrome (WS) and either no SVAS or surgical SVAS. Here, we use those variants to develop and test polygenic risk scores (PRS). We used the clumping and thresholding (CT) approach on the full 1064 variants and a 427-variant subset that was part of 13 biologically relevant pathways identified in the previous study. We also used a lasso approach on the full set. We were able to achieve an area under the curve (AUC) of >0.99 for the two CT PRS methods, using only 622 and 320 variants respectively when 2/3 of the initial 217 participants data were used for training and 1/3 for testing. The lasso performed less well. We then evaluated the performance of those PRS variant sets on an additional group of 138 patients with WS with intermediate severity SVAS and found a misclassification rate of <10% between the surgical and intermediate groups, suggesting potential for clinical utility of the score.

在之前的一项基于路径的极端表型研究中，我们发现了 1064 个与威廉姆斯综合征（WS）患者主动脉瓣上狭窄（SVAS）严重程度相关的变体，这些变体要么没有 SVAS，要么接受过 SVAS 手术。在此，我们利用这些变异来开发和测试多基因风险评分（PRS）。我们对全部 1064 个变异体和 427 个变异体子集使用了聚类和阈值（CT）方法，这些变异体是先前研究中发现的 13 条生物相关路径的一部分。我们还对全部变异集使用了套索法。当初始的 217 名参与者数据的 2/3 用于训练，1/3 用于测试时，我们只分别使用了 622 个和 320 个变异体，两种 CT PRS 方法的曲线下面积（AUC）就达到了 0.99。套索法的表现较差。然后，我们在另外 138 名中等严重程度 SVAS 的 WS 患者身上评估了这些 PRS 变体集的性能，发现手术组和中等组之间的误分类率为 <10%，这表明该评分具有潜在的临床实用性。

{"title":"Identifying individuals at risk for surgical supravalvar aortic stenosis by polygenic risk score with graded phenotyping","authors":"Delong Liu, Carolyn Beth Mervis, Mark Levin, Elisa Biamino, Maria Francesca Bedeschi, Maria Cristina Digilio, Gabriella Maria Squeo, Roberta Villa, Neelam Raja, Joy Lynne Freeman, Sharon Osgood, Giuseppe Merla, Amy Roberts, Colleen Morris, Lucy R Osborne, Beth Kozel","doi":"10.1101/2024.09.17.24313555","DOIUrl":"https://doi.org/10.1101/2024.09.17.24313555","url":null,"abstract":"In a previous pathway-based, extreme phenotype study, we identified 1064 variants associated with supravalvar aortic stenosis (SVAS) severity in people with Williams syndrome (WS) and either no SVAS or surgical SVAS. Here, we use those variants to develop and test polygenic risk scores (PRS). We used the clumping and thresholding (CT) approach on the full 1064 variants and a 427-variant subset that was part of 13 biologically relevant pathways identified in the previous study. We also used a lasso approach on the full set. We were able to achieve an area under the curve (AUC) of >0.99 for the two CT PRS methods, using only 622 and 320 variants respectively when 2/3 of the initial 217 participants data were used for training and 1/3 for testing. The lasso performed less well. We then evaluated the performance of those PRS variant sets on an additional group of 138 patients with WS with intermediate severity SVAS and found a misclassification rate of <10% between the surgical and intermediate groups, suggesting potential for clinical utility of the score.","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142257114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deciphering the functional impact of Alzheimers Disease-associated variants in resting and proinflammatory immune cells 解密阿尔茨海默病相关变体对静息免疫细胞和促炎免疫细胞的功能影响

medRxiv - Genetic and Genomic Medicine

Pub Date : 2024-09-16 DOI: 10.1101/2024.09.13.24313654

Marielle L Bond, I Yoseli Quiroga-Barber, Susan D'Costa, Yijia Wu, Jessica Bell, Jessica McAfee, Nicole Kramer, Sool Lee, Mary Patrucco, Douglas Phanstiel, Hyejung Won

Genome-wide association studies have identified loci associated with Alzheimers Disease (AD), but identifying the exact causal variants and genes at each locus is challenging due to linkage disequilibrium and their largely non-coding nature. To address this, we performed a massively parallel reporter assay of 3,576 AD-associated variants in THP-1 macrophages in both resting and proinflammatory states and identified 47 expression-modulating variants (emVars). To understand the endogenous chromatin context of emVars, we built an activity-by-contact model using epigenomic maps of macrophage inflammation and inferred condition-specific enhancer-promoter pairs. Intersection of emVars with enhancer-promoter pairs and microglia expression quantitative trait loci allowed us to connect 39 emVars to 76 putative AD risk genes enriched for AD-associated molecular signatures. Overall, systematic characterization of AD-associated variants enhances our understanding of the regulatory mechanisms underlying AD pathogenesis.

全基因组关联研究已经确定了与阿尔茨海默病（AD）相关的基因位点，但由于连锁不平衡及其大部分非编码的性质，确定每个基因位点上的确切致病变体和基因具有挑战性。为了解决这个问题，我们对静息和促炎状态下 THP-1 巨噬细胞中的 3,576 个 AD 相关变体进行了大规模并行报告分析，并鉴定出 47 个表达调节变体（emVars）。为了了解 emVars 的内源性染色质背景，我们利用巨噬细胞炎症的表观基因组图谱建立了一个活动-接触模型，并推断出了特定条件下的增强子-启动子对。emVars与增强子-启动子对和小胶质细胞表达定量性状位点的交叉使我们能够将39个emVars与76个富含AD相关分子特征的假定AD风险基因联系起来。总之，对 AD 相关变异的系统表征加深了我们对 AD 发病机制的调控机制的理解。

{"title":"Deciphering the functional impact of Alzheimers Disease-associated variants in resting and proinflammatory immune cells","authors":"Marielle L Bond, I Yoseli Quiroga-Barber, Susan D'Costa, Yijia Wu, Jessica Bell, Jessica McAfee, Nicole Kramer, Sool Lee, Mary Patrucco, Douglas Phanstiel, Hyejung Won","doi":"10.1101/2024.09.13.24313654","DOIUrl":"https://doi.org/10.1101/2024.09.13.24313654","url":null,"abstract":"Genome-wide association studies have identified loci associated with Alzheimers Disease (AD), but identifying the exact causal variants and genes at each locus is challenging due to linkage disequilibrium and their largely non-coding nature. To address this, we performed a massively parallel reporter assay of 3,576 AD-associated variants in THP-1 macrophages in both resting and proinflammatory states and identified 47 expression-modulating variants (emVars). To understand the endogenous chromatin context of emVars, we built an activity-by-contact model using epigenomic maps of macrophage inflammation and inferred condition-specific enhancer-promoter pairs. Intersection of emVars with enhancer-promoter pairs and microglia expression quantitative trait loci allowed us to connect 39 emVars to 76 putative AD risk genes enriched for AD-associated molecular signatures. Overall, systematic characterization of AD-associated variants enhances our understanding of the regulatory mechanisms underlying AD pathogenesis.","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"124 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Plasma glucosylceramide levels are regulated by ATP10D and are not involved in Parkinson's disease pathogenesis. 血浆葡萄糖甘油酰胺水平受 ATP10D 调节，与帕金森病发病机制无关。

medRxiv - Genetic and Genomic Medicine

Pub Date : 2024-09-16 DOI: 10.1101/2024.09.13.24313644

Emma N Somerville, Alva James, Christian Beatz, Robert Schwieger, Gal Barrel, Krishna K Kandaswamy, Marius I Iurascu, Peter Bauer, Michael Ta, Hirotaka Iwaki, Konstantin Senkevich, Eric Yu, Roy N Alcalay, Ziv Gan-Or

GBA1 variants and decreased glucocerebrosidase (GCase) activity are implicated in Parkinson's disease (PD). We investigated the hypothesis that increased levels of glucosylceramide (GlcCer), one of GCase main substrates, are involved in PD pathogenesis. Using multiple genetic methods, we show that ATP10D, not GBA1, is the main regulator of plasma GlcCer levels, yet it is not involved in PD pathogenesis. Plasma GlcCer levels were associated with PD, but not in a causative manner, and are not predictive of disease status. These results argue against targeting GlcCer in GBA1-PD and underscore the need to explore alternative mechanisms and biomarkers for PD.

GBA1变体和葡萄糖脑苷脂酶（GCase）活性降低与帕金森病（PD）有关。我们研究了葡萄糖酰甘油酰胺（GlcCer）（GCase 的主要底物之一）水平升高与帕金森病发病机制有关的假说。通过多种遗传学方法，我们发现 ATP10D（而非 GBA1）是血浆 GlcCer 水平的主要调节因子，但它与帕金森病的发病机制无关。血浆 GlcCer 水平与帕金森病相关，但不是致病性的，也不能预测疾病状态。这些结果反对以GBA1-PD中的GlcCer为靶点，并强调了探索PD替代机制和生物标志物的必要性。

引用次数: 0

Allelic effects on KLHL17 expression likely mediated by JunB/D underlie a PDAC GWAS signal at chr1p36.33 可能由 JunB/D 介导的 KLHL17 表达的等位基因效应是 chr1p36.33 处 PDAC GWAS 信号的基础

medRxiv - Genetic and Genomic Medicine

Pub Date : 2024-09-16 DOI: 10.1101/2024.09.16.24313748

Katelyn E Connelly, Katherine Hullin, Ehssan Abdolalizadeh, Jun Zhong, Daina Eiser, Aidan O'Brien, Irene Collins, Sudipto Das, Gerard Duncan, Pancreatic Cancer Cohort Consortium, Pancreatic Cancer Case-Control Consortium, Stephen Chanock, Rachael Z Stolzenberg-Solomon, Alison Klein, Brian M Wolpin, Jason W Hoskins, Thorkell Andresson, Jill P Smith, Laufey T Amundadottir

Pancreatic Ductal Adenocarcinoma (PDAC) is the third leading cause of cancer-related deaths in the U.S. Both rare and common germline variants contribute to PDAC risk. Here, we fine-map and functionally characterize a common PDAC risk signal at 1p36.33 (tagged by rs13303010) identified through a genome wide association study (GWAS). One of the fine-mapped SNPs, rs13303160 (r²=0.93 in 1000G EUR samples, OR=1.23, P value=2.74x10^-9) demonstrated allele-preferential gene regulatory activity in vitro and allele-preferential binding of JunB and JunD in vitro and in vivo. Expression Quantitative Trait Locus (eQTL) analysis identified KLHL17 as a likely target gene underlying the signal. Proteomic analysis identified KLHL17 as a member of the Cullin-E3 ubiquitin ligase complex in PDAC-derived cells. In silico differential gene expression analysis of the GTExv8 pancreas data suggested an association between lower KLHL17 (risk associated) and pro-inflammatory pathways. We hypothesize that KLHL17 may mitigate inflammation by recruiting pro-inflammatory proteins for ubiquitination and degradation thereby influencing PDAC risk.

胰腺导管腺癌（PDAC）是美国癌症相关死亡的第三大主要原因。在这里，我们对通过基因组广泛关联研究（GWAS）确定的 1p36.33（rs13303010 标记）处的常见 PDAC 风险信号进行了精细图谱绘制和功能表征。其中一个精细映射的 SNP，rs13303160（在 1000G EUR 样本中 r2=0.93，OR=1.23，P 值=2.74x10-9）在体外显示出等位基因偏好的基因调控活性，在体外和体内显示出等位基因偏好的与 JunB 和 JunD 的结合。表达定量性状基因座（eQTL）分析确定 KLHL17 可能是信号的靶基因。蛋白质组分析发现，KLHL17是PDAC衍生细胞中Cullin-E3泛素连接酶复合物的成员。对 GTExv8 胰腺数据进行的硅学差异基因表达分析表明，较低的 KLHL17（风险相关）与促炎通路之间存在关联。我们推测 KLHL17 可能通过招募促炎症蛋白泛素化和降解来缓解炎症，从而影响 PDAC 风险。

{"title":"Allelic effects on KLHL17 expression likely mediated by JunB/D underlie a PDAC GWAS signal at chr1p36.33","authors":"Katelyn E Connelly, Katherine Hullin, Ehssan Abdolalizadeh, Jun Zhong, Daina Eiser, Aidan O'Brien, Irene Collins, Sudipto Das, Gerard Duncan, Pancreatic Cancer Cohort Consortium, Pancreatic Cancer Case-Control Consortium, Stephen Chanock, Rachael Z Stolzenberg-Solomon, Alison Klein, Brian M Wolpin, Jason W Hoskins, Thorkell Andresson, Jill P Smith, Laufey T Amundadottir","doi":"10.1101/2024.09.16.24313748","DOIUrl":"https://doi.org/10.1101/2024.09.16.24313748","url":null,"abstract":"Pancreatic Ductal Adenocarcinoma (PDAC) is the third leading cause of cancer-related deaths in the U.S. Both rare and common germline variants contribute to PDAC risk. Here, we fine-map and functionally characterize a common PDAC risk signal at 1p36.33 (tagged by rs13303010) identified through a genome wide association study (GWAS). One of the fine-mapped SNPs, rs13303160 (r2=0.93 in 1000G EUR samples, OR=1.23, P value=2.74x10-9) demonstrated allele-preferential gene regulatory activity in vitro and allele-preferential binding of JunB and JunD in vitro and in vivo. Expression Quantitative Trait Locus (eQTL) analysis identified KLHL17 as a likely target gene underlying the signal. Proteomic analysis identified KLHL17 as a member of the Cullin-E3 ubiquitin ligase complex in PDAC-derived cells. In silico differential gene expression analysis of the GTExv8 pancreas data suggested an association between lower KLHL17 (risk associated) and pro-inflammatory pathways. We hypothesize that KLHL17 may mitigate inflammation by recruiting pro-inflammatory proteins for ubiquitination and degradation thereby influencing PDAC risk.","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Linking the plasma proteome to genetics in individuals from continental Africa provides insights into type 2 diabetes pathogenesis 将非洲大陆个体的血浆蛋白质组与遗传学联系起来，有助于深入了解 2 型糖尿病的发病机制

medRxiv - Genetic and Genomic Medicine

Pub Date : 2024-09-16 DOI: 10.1101/2024.09.16.24313728

Opeyemi Soremekun, Young-chan Park, Mauro Tutino, Allan Kalungi, Moffat J. Nyirenda, Segun Fatumo, Eleftheria Zeggini

Individuals of African ancestry remain largely underrepresented in genetic and proteomic studies. Here, we measure the levels of 2,873 proteins using the Olink proximity extension assay in plasma samples from 163 individuals with type 2 diabetes (T2D) or prediabetes and 362 normoglycemic controls from the Ugandan population for the first time. We identify 88 differentially expressed proteins between the two groups and 208 proteins associated with cardiometabolic traits. We link genome-wide data to protein expression levels and construct the first protein quantitative trait locus (pQTL) map in this population. We identify 399 independent associations with 346 (86.7%) cis-pQTLs and 53 (13.3%) trans-pQTLs. 16.7% of the cis-pQTLs and all of the trans-pQTLs have not been previously reported in African-ancestry individuals. Of these, 37 pQTLs have not been previously reported in any population. We find evidence for colocalization between a pQTL for SIRPA and T2D genetic risk. Mendelian randomization analysis identified 20 proteins causally associated with T2D. Our findings reveal proteins causally implicated in the pathogenesis of T2D, which may be leveraged for personalized medicine tailored to African-ancestry individuals.

在基因和蛋白质组学研究中，非洲血统的个体在很大程度上仍然代表性不足。在这里，我们首次使用奥林克邻近延伸测定法测量了 163 名 2 型糖尿病（T2D）或糖尿病前期患者和 362 名乌干达血糖正常对照者血浆样本中 2873 种蛋白质的水平。我们发现了两组之间存在表达差异的 88 种蛋白质，以及与心脏代谢特征相关的 208 种蛋白质。我们将全基因组数据与蛋白质表达水平联系起来，并在该人群中构建了首个蛋白质定量性状位点（pQTL）图谱。我们发现了 399 个独立的关联，其中有 346 个（86.7%）顺式-pQTL 和 53 个（13.3%）反式-pQTL。16.7% 的顺式-pQTL 和所有的反式-pQTL 之前在非洲裔个体中都未见报道。其中，37 个 pQTL 之前未在任何人群中报道过。我们发现了 SIRPA 的 pQTL 与 T2D 遗传风险共定位的证据。孟德尔随机分析确定了 20 种与 T2D 有因果关系的蛋白质。我们的研究结果揭示了与 T2D 发病机制有因果关系的蛋白质，这些蛋白质可用于针对非洲裔个体的个性化医疗。

{"title":"Linking the plasma proteome to genetics in individuals from continental Africa provides insights into type 2 diabetes pathogenesis","authors":"Opeyemi Soremekun, Young-chan Park, Mauro Tutino, Allan Kalungi, Moffat J. Nyirenda, Segun Fatumo, Eleftheria Zeggini","doi":"10.1101/2024.09.16.24313728","DOIUrl":"https://doi.org/10.1101/2024.09.16.24313728","url":null,"abstract":"Individuals of African ancestry remain largely underrepresented in genetic and proteomic studies. Here, we measure the levels of 2,873 proteins using the Olink proximity extension assay in plasma samples from 163 individuals with type 2 diabetes (T2D) or prediabetes and 362 normoglycemic controls from the Ugandan population for the first time. We identify 88 differentially expressed proteins between the two groups and 208 proteins associated with cardiometabolic traits. We link genome-wide data to protein expression levels and construct the first protein quantitative trait locus (pQTL) map in this population. We identify 399 independent associations with 346 (86.7%) cis-pQTLs and 53 (13.3%) trans-pQTLs. 16.7% of the cis-pQTLs and all of the trans-pQTLs have not been previously reported in African-ancestry individuals. Of these, 37 pQTLs have not been previously reported in any population. We find evidence for colocalization between a pQTL for SIRPA and T2D genetic risk. Mendelian randomization analysis identified 20 proteins causally associated with T2D. Our findings reveal proteins causally implicated in the pathogenesis of T2D, which may be leveraged for personalized medicine tailored to African-ancestry individuals.","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Genome-wide Association Study Identifies Novel Genetic Variants Associated with Knee Pain in the UK Biobank (N = 441,757) 全基因组关联研究发现英国生物库中与膝关节疼痛有关的新型遗传变异（N = 441,757）

medRxiv - Genetic and Genomic Medicine

Pub Date : 2024-09-16 DOI: 10.1101/2024.09.16.24313726

Yiwen Tao, Qi Pan, tengda cai, Luning Yang, mainul haque, tania dottorini, Weihua Meng

Knee pain is a widespread musculoskeletal condition affecting millions globally, with significant socio-economic implications. This study endeavors to identify genetic variants associated with knee pain through a comprehensive genome-wide association study (GWAS) using data from 441,757 individuals in the UK Biobank. The primary GWAS identified ten significant loci, including eight novel loci, with the most significant single nucleotide polymorphism (SNP) being rs143384 near the GDF5 gene on chromosome 20 (p = 4.68 x 10^-19). In the replication study, seven loci (rs143384, rs919642, rs55760279, rs56076919, rs3892354, rs687878, rs368636424) were found to be significant in the FinnGen cohort. Further, sex-specific analyses revealed distinct genetic associations, identifying three loci (rs143384 with p = 1.70x10^-15, rs56076919 with p = 1.60x10^-9, rs919642 with p = 1.45x10^-8) in females and four loci ( rs2899611 with p = 2.77 x 10^-11, rs891720 with p = 5.55 x 10^-11, rs2742313 with p = 4.19 x 10^-9, rs2019689 with p = 6.51 x 10^-9) in males. The phenome-wide association analysis and Mendelian randomization analysis revealed significant links between several phenotypes and knee pain such as leg pain on walking. These findings enhance our understanding of the genetic factors of knee pain, offering potential pathways for therapeutic interventions and personalized medical strategies.

膝关节疼痛是一种广泛存在的肌肉骨骼疾病，影响着全球数百万人，并对社会经济产生重大影响。本研究利用英国生物库中 441,757 人的数据，通过全面的全基因组关联研究（GWAS），努力确定与膝关节疼痛相关的遗传变异。主要的全基因组关联研究确定了 10 个重要位点，包括 8 个新位点，其中最重要的单核苷酸多态性（SNP）是 20 号染色体 GDF5 基因附近的 rs143384（p = 4.68 x 10-19）。在复制研究中，发现芬兰基因队列中有七个位点（rs143384、rs919642、rs55760279、rs56076919、rs3892354、rs687878、rs368636424）具有显著性。此外，性别特异性分析显示了不同的遗传关联，在女性中发现了三个位点（rs143384，p = 1.70x10-15；rs56076919，p = 1.60x10-9；rs919642，p = 1.45x10-8），男性有四个位点（rs2899611，p = 2.77 x 10-11；rs891720，p = 5.55 x 10-11；rs2742313，p = 4.19 x 10-9；rs2019689，p = 6.51 x 10-9）。全表型关联分析和孟德尔随机分析表明，若干表型与膝关节疼痛（如行走时腿部疼痛）之间存在显著联系。这些发现加深了我们对膝关节疼痛遗传因素的了解，为治疗干预和个性化医疗策略提供了潜在途径。

{"title":"A Genome-wide Association Study Identifies Novel Genetic Variants Associated with Knee Pain in the UK Biobank (N = 441,757)","authors":"Yiwen Tao, Qi Pan, tengda cai, Luning Yang, mainul haque, tania dottorini, Weihua Meng","doi":"10.1101/2024.09.16.24313726","DOIUrl":"https://doi.org/10.1101/2024.09.16.24313726","url":null,"abstract":"Knee pain is a widespread musculoskeletal condition affecting millions globally, with significant socio-economic implications. This study endeavors to identify genetic variants associated with knee pain through a comprehensive genome-wide association study (GWAS) using data from 441,757 individuals in the UK Biobank. The primary GWAS identified ten significant loci, including eight novel loci, with the most significant single nucleotide polymorphism (SNP) being rs143384 near the GDF5 gene on chromosome 20 (p = 4.68 x 10-19). In the replication study, seven loci (rs143384, rs919642, rs55760279, rs56076919, rs3892354, rs687878, rs368636424) were found to be significant in the FinnGen cohort. Further, sex-specific analyses revealed distinct genetic associations, identifying three loci (rs143384 with p = 1.70x10-15, rs56076919 with p = 1.60x10-9, rs919642 with p = 1.45x10-8) in females and four loci ( rs2899611 with p = 2.77 x 10-11, rs891720 with p = 5.55 x 10-11, rs2742313 with p = 4.19 x 10-9, rs2019689 with p = 6.51 x 10-9) in males. The phenome-wide association analysis and Mendelian randomization analysis revealed significant links between several phenotypes and knee pain such as leg pain on walking. These findings enhance our understanding of the genetic factors of knee pain, offering potential pathways for therapeutic interventions and personalized medical strategies.","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Genetic associations between SGLT2 inhibition, DPP4 inhibition or GLP1R agonism and prostate cancer risk: a two-sample Mendelian randomisation study SGLT2 抑制剂、DPP4 抑制剂或 GLP1R 激动剂与前列腺癌风险之间的遗传关联：一项双样本孟德尔随机研究

medRxiv - Genetic and Genomic Medicine

Pub Date : 2024-09-16 DOI: 10.1101/2024.09.15.24313695

Lin Shen, Yifang Yang, Lei Lu, Oscar Hou In Chou, Quinncy Lee, Tong Liu, Guoliang Li, Shuk Han Cheng, Gary Tse, Jiandong Zhou

Background: Epidemiological studies have linked the use of the anti-diabetic medications, sodium-glucose co-transporter-2 inhibitors (SGLT2I), dipeptidyl peptidase-4 inhibitors (DPP4I) and glucagon-like peptide-1 receptor agonists (GLP1RA), with prostate cancer risk. However, these studies cannot infer causality. Methods: This was a two-sample Mendelian randomization (MR) using genome-wide association study data designed to identify causal relationships between SGLT2I, DPP4I or GLP1RA and prostate cancer. Genetic associations with HbA1c and risk of prostate cancer were extracted from IEU Open-GWAS Project database with GWAS id ukb-d-30750_irnt (UK Biobank cohort) and ebi-a-GCST006085 (European Molecular Biology Laboratory's European Bioinformatics Institute cohort), respectively. The two GWAS datasets chosen were obtained from individuals of European ancestry to minimise potential bias from population stratification. The encoding genes targeted by SGLT2I, DPP4I and GLP1RA were SGC5A2, DPP4 and GLP1R, located in Chr16: 31494323-31502181, Chr2: 162848755-162930904 and Chr6: 39016557-39059079, respectively. Results: A total of 31, 2 and 5 single nucleotide variants (SNVs) were used for SGC5A2, DPP4 and GLP1R. Our MR analysis results supported a causal relationship between genetic variation in SLC5A2 and DPP4 and reduced risk of prostate cancer at the Bonferroni-corrected threshold, with odds ratios (OR) [95% confidence intervals] of 0.47 [0.38-0.58] and 0.35 [0.24-0.53], but not for GLP1R (OR: 1.39 [0.93-2.07]). Sensitivity analyses by the leave-one-out method did not significantly alter the OR for SGLT2I.Conclusions: The two-sample MR analysis found that SGLT2 and DPP4 inhibition, but not GLP1R agonism, was associated with lower risks of developing prostate cancer.

背景：流行病学研究表明，使用抗糖尿病药物钠-葡萄糖协同转运体-2抑制剂（SGLT2I）、二肽基肽酶-4抑制剂（DPP4I）和胰高血糖素样肽-1受体激动剂（GLP1RA）与前列腺癌风险有关。然而，这些研究并不能推断出因果关系。研究方法这是一项利用全基因组关联研究数据进行的双样本孟德尔随机化（MR）研究，旨在确定 SGLT2I、DPP4I 或 GLP1RA 与前列腺癌之间的因果关系。与 HbA1c 和前列腺癌风险相关的基因分别从 IEU Open-GWAS 项目数据库中提取，GWAS id 分别为 ukb-d-30750_irnt （英国生物库队列）和 ebi-a-GCST006085（欧洲分子生物学实验室的欧洲生物信息学研究所队列）。所选的两个 GWAS 数据集均来自欧洲血统的个体，以尽量减少人群分层可能造成的偏差。SGLT2I、DPP4I 和 GLP1RA 的编码基因分别是 SGC5A2、DPP4 和 GLP1R，位于 Chr16: 31494323-31502181、Chr2: 162848755-162930904 和 Chr6: 39016557-39059079。结果SGC5A2、DPP4 和 GLP1R 的单核苷酸变异（SNV）分别为 31、2 和 5 个。我们的 MR 分析结果支持 SLC5A2 和 DPP4 的遗传变异与前列腺癌风险降低之间存在因果关系（Bonferroni 校正阈值为 0.47 [0.38-0.58] 和 0.35 [0.24-0.53] 的几率比（OR）[95% 置信区间]），但 GLP1R 的几率比（OR：1.39 [0.93-2.07]）不支持这种因果关系。通过撇除法进行的敏感性分析并未显著改变 SGLT2I 的 OR 值：双样本 MR 分析发现，SGLT2 和 DPP4 抑制与前列腺癌发病风险的降低有关，而 GLP1R 激动与前列腺癌发病风险的降低无关。

{"title":"Genetic associations between SGLT2 inhibition, DPP4 inhibition or GLP1R agonism and prostate cancer risk: a two-sample Mendelian randomisation study","authors":"Lin Shen, Yifang Yang, Lei Lu, Oscar Hou In Chou, Quinncy Lee, Tong Liu, Guoliang Li, Shuk Han Cheng, Gary Tse, Jiandong Zhou","doi":"10.1101/2024.09.15.24313695","DOIUrl":"https://doi.org/10.1101/2024.09.15.24313695","url":null,"abstract":"Background: Epidemiological studies have linked the use of the anti-diabetic medications, sodium-glucose co-transporter-2 inhibitors (SGLT2I), dipeptidyl peptidase-4 inhibitors (DPP4I) and glucagon-like peptide-1 receptor agonists (GLP1RA), with prostate cancer risk. However, these studies cannot infer causality. Methods: This was a two-sample Mendelian randomization (MR) using genome-wide association study data designed to identify causal relationships between SGLT2I, DPP4I or GLP1RA and prostate cancer. Genetic associations with HbA1c and risk of prostate cancer were extracted from IEU Open-GWAS Project database with GWAS id ukb-d-30750_irnt (UK Biobank cohort) and ebi-a-GCST006085 (European Molecular Biology Laboratory's European Bioinformatics Institute cohort), respectively. The two GWAS datasets chosen were obtained from individuals of European ancestry to minimise potential bias from population stratification. The encoding genes targeted by SGLT2I, DPP4I and GLP1RA were SGC5A2, DPP4 and GLP1R, located in Chr16: 31494323-31502181, Chr2: 162848755-162930904 and Chr6: 39016557-39059079, respectively. Results: A total of 31, 2 and 5 single nucleotide variants (SNVs) were used for SGC5A2, DPP4 and GLP1R. Our MR analysis results supported a causal relationship between genetic variation in SLC5A2 and DPP4 and reduced risk of prostate cancer at the Bonferroni-corrected threshold, with odds ratios (OR) [95% confidence intervals] of 0.47 [0.38-0.58] and 0.35 [0.24-0.53], but not for GLP1R (OR: 1.39 [0.93-2.07]). Sensitivity analyses by the leave-one-out method did not significantly alter the OR for SGLT2I.\u0000Conclusions: The two-sample MR analysis found that SGLT2 and DPP4 inhibition, but not GLP1R agonism, was associated with lower risks of developing prostate cancer.","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Plasma proteomic signatures for type 2 diabetes mellitus and related traits in the UK Biobank cohort 英国生物库队列中 2 型糖尿病及相关特征的血浆蛋白质组特征

medRxiv - Genetic and Genomic Medicine

Pub Date : 2024-09-15 DOI: 10.1101/2024.09.13.24313501

Trisha P. Gupte, Zahra Azizi, Pik Fang Kho, Jiayan Zhou, Kevin Nzenkue, Ming-Li Chen, Daniel J. Panyard, Rodrigo Guarischi-Sousa, Austin T. Hilliard, Disha Sharma, Kathleen Watson, Fahim Abbasi, Philip S. Tsao, Shoa L. Clarke, Themistocles L. Assimes

Aims/hypothesis: The plasma proteome holds promise as a diagnostic and prognostic tool that can accurately reflect complex human traits and disease processes. We assessed the ability of plasma proteins to predict type 2 diabetes mellitus (T2DM) and related traits. Methods: Clinical, genetic, and high-throughput proteomic data from three subcohorts of UK Biobank participants were analyzed for association with dual-energy x-ray absorptiometry (DXA) derived truncal fat (in the adiposity subcohort), estimated maximum oxygen consumption (VO2max) (in the fitness subcohort), and incident T2DM (in the T2DM subcohort). We used least absolute shrinkage and selection operator (LASSO) regression to assess the relative ability of non-proteomic and proteomic variables to associate with each trait by comparing variance explained (R2) and area under the curve (AUC) statistics between data types. Stability selection with randomized LASSO regression identified the most robustly associated proteins for each trait. The benefit of proteomic signatures (PSs) over QDiabetes, a T2DM clinical risk score, was evaluated through the derivation of delta (∆) AUC values. We also assessed the incremental gain in model performance metrics using proteomic datasets with varying numbers of proteins. A series of two-sample Mendelian randomization (MR) analyses were conducted to identify potentially causal proteins for adiposity, fitness, and T2DM. Results: Across all three subcohorts, the mean age was 56.7 years and 54.9% were female. In the T2DM subcohort, 5.8% developed incident T2DM over a median follow-up of 7.6 years. LASSO-derived PSs increased the R2 of truncal fat and VO2max over clinical and genetic factors by 0.074 and 0.057, respectively. We observed a similar improvement in T2DM prediction over the QDiabetes score [Δ AUC: 0.016 (95% CI 0.008, 0.024)] when using a robust PS derived strictly from the T2DM outcome versus a model further augmented with non-overlapping proteins associated with adiposity and fitness. A small number of proteins (29 for truncal adiposity, 18 for VO2max, and 26 for T2DM) identified by stability selection algorithms offered most of the improvement in prediction of each outcome. Filtered and clustered versions of the full proteomic dataset supplied by the UK Biobank (ranging between 600-1,500 proteins) performed comparably to the full dataset for T2DM prediction. Using MR, we identified 4 proteins as potentially causal for adiposity, 1 as potentially causal for fitness, and 4 as potentially causal for T2DM. Conclusions/Interpretation: Plasma PSs modestly improve the prediction of incident T2DM over that possible with clinical and genetic factors. Further studies are warranted to better elucidate the clinical utility of these signatures in predicting the risk of T2DM over the standard practice of using the QDiabetes score. Candidate causally associated proteins identified through MR deserve further study as potential novel therapeutic targets for T2

目的/假设：血浆蛋白质组有望成为一种诊断和预后工具，准确反映复杂的人体特征和疾病过程。我们评估了血浆蛋白质预测 2 型糖尿病（T2DM）及相关特征的能力。研究方法我们分析了英国生物库三个子队列参与者的临床、遗传和高通量蛋白质组数据与双能 X 射线吸收测定法（DXA）得出的躯干脂肪（在肥胖子队列中）、估计最大耗氧量（VO2max）（在体能子队列中）和 T2DM 事件（在 T2DM 子队列中）之间的关联。我们使用最小绝对收缩和选择算子（LASSO）回归法，通过比较数据类型之间的解释方差（R2）和曲线下面积（AUC）统计量，评估非蛋白质组变量和蛋白质组变量与每个性状相关联的相对能力。通过随机 LASSO 回归进行稳定性选择，确定了与每个性状关联性最强的蛋白质。通过得出 delta (∆) AUC 值，评估了蛋白质组特征（PSs）相对于 T2DM 临床风险评分 QDiabetes 的优势。我们还利用蛋白质数量不同的蛋白质组数据集评估了模型性能指标的增益。我们还进行了一系列双样本孟德尔随机化（MR）分析，以确定脂肪、体能和 T2DM 的潜在因果蛋白。分析结果所有三个亚群的平均年龄为 56.7 岁，54.9% 为女性。在 T2DM 亚群中，5.8% 的人在中位 7.6 年的随访期间患上了 T2DM。LASSO 衍生的 PS 使截肢脂肪和 VO2max 的 R2 分别比临床因素和遗传因素增加了 0.074 和 0.057。我们观察到，与 QDiabetes 评分相比，当使用严格从 T2DM 结果得出的稳健 PS 与使用与脂肪和体能相关的非重叠蛋白进一步增强的模型时，T2DM 预测结果也有类似的改善[Δ AUC：0.016 (95% CI 0.008, 0.024)]。通过稳定性选择算法确定的少量蛋白质（29 个用于预测躯干脂肪，18 个用于预测 VO2max，26 个用于预测 T2DM）为每种结果的预测提供了大部分改进。英国生物库提供的完整蛋白质组数据集的过滤和聚类版本（蛋白质数量在 600-1,500 个之间）在 T2DM 预测方面的表现与完整数据集相当。利用磁共振技术，我们确定了 4 种蛋白质可能与肥胖有关，1 种可能与体质有关，4 种可能与 T2DM 有关。结论/解释：与临床和遗传因素相比，血浆 PS 可适度改善对 T2DM 发病的预测。与使用 QDiabetes 评分的标准做法相比，这些特征在预测 T2DM 风险方面的临床实用性有待进一步研究。通过磁共振鉴定出的候选因果相关蛋白作为 T2DM 的潜在新型治疗靶点值得进一步研究。

{"title":"Plasma proteomic signatures for type 2 diabetes mellitus and related traits in the UK Biobank cohort","authors":"Trisha P. Gupte, Zahra Azizi, Pik Fang Kho, Jiayan Zhou, Kevin Nzenkue, Ming-Li Chen, Daniel J. Panyard, Rodrigo Guarischi-Sousa, Austin T. Hilliard, Disha Sharma, Kathleen Watson, Fahim Abbasi, Philip S. Tsao, Shoa L. Clarke, Themistocles L. Assimes","doi":"10.1101/2024.09.13.24313501","DOIUrl":"https://doi.org/10.1101/2024.09.13.24313501","url":null,"abstract":"Aims/hypothesis: The plasma proteome holds promise as a diagnostic and prognostic tool that can accurately reflect complex human traits and disease processes. We assessed the ability of plasma proteins to predict type 2 diabetes mellitus (T2DM) and related traits. Methods: Clinical, genetic, and high-throughput proteomic data from three subcohorts of UK Biobank participants were analyzed for association with dual-energy x-ray absorptiometry (DXA) derived truncal fat (in the adiposity subcohort), estimated maximum oxygen consumption (VO2max) (in the fitness subcohort), and incident T2DM (in the T2DM subcohort). We used least absolute shrinkage and selection operator (LASSO) regression to assess the relative ability of non-proteomic and proteomic variables to associate with each trait by comparing variance explained (R2) and area under the curve (AUC) statistics between data types. Stability selection with randomized LASSO regression identified the most robustly associated proteins for each trait. The benefit of proteomic signatures (PSs) over QDiabetes, a T2DM clinical risk score, was evaluated through the derivation of delta (∆) AUC values. We also assessed the incremental gain in model performance metrics using proteomic datasets with varying numbers of proteins. A series of two-sample Mendelian randomization (MR) analyses were conducted to identify potentially causal proteins for adiposity, fitness, and T2DM. Results: Across all three subcohorts, the mean age was 56.7 years and 54.9% were female. In the T2DM subcohort, 5.8% developed incident T2DM over a median follow-up of 7.6 years. LASSO-derived PSs increased the R2 of truncal fat and VO2max over clinical and genetic factors by 0.074 and 0.057, respectively. We observed a similar improvement in T2DM prediction over the QDiabetes score [Δ AUC: 0.016 (95% CI 0.008, 0.024)] when using a robust PS derived strictly from the T2DM outcome versus a model further augmented with non-overlapping proteins associated with adiposity and fitness. A small number of proteins (29 for truncal adiposity, 18 for VO2max, and 26 for T2DM) identified by stability selection algorithms offered most of the improvement in prediction of each outcome. Filtered and clustered versions of the full proteomic dataset supplied by the UK Biobank (ranging between 600-1,500 proteins) performed comparably to the full dataset for T2DM prediction. Using MR, we identified 4 proteins as potentially causal for adiposity, 1 as potentially causal for fitness, and 4 as potentially causal for T2DM. Conclusions/Interpretation: Plasma PSs modestly improve the prediction of incident T2DM over that possible with clinical and genetic factors. Further studies are warranted to better elucidate the clinical utility of these signatures in predicting the risk of T2DM over the standard practice of using the QDiabetes score. Candidate causally associated proteins identified through MR deserve further study as potential novel therapeutic targets for T2","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142257019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A plasma proteomic signature for atherosclerotic cardiovascular disease risk prediction in the UK Biobank cohort 英国生物库队列中用于预测动脉粥样硬化性心血管疾病风险的血浆蛋白质组特征

medRxiv - Genetic and Genomic Medicine

Pub Date : 2024-09-15 DOI: 10.1101/2024.09.13.24313652

Trisha P. Gupte, Zahra Azizi, Pik Fang Kho, Jiayan Zhou, Ming-Li Chen, Daniel J. Panyard, Rodrigo Guarischi-Sousa, Austin T. Hilliard, Disha Sharma, Kathleen Watson, Fahim Abbasi, Shoa L. Clarke, Themistocles L. Assimes

Background: While risk stratification for atherosclerotic cardiovascular disease (ASCVD) is essential for primary prevention, current clinical risk algorithms demonstrate variability and leave room for further improvement. The plasma proteome holds promise as a future diagnostic and prognostic tool that can accurately reflect complex human traits and disease processes. We assessed the ability of plasma proteins to predict ASCVD. Method: Clinical, genetic, and high-throughput plasma proteomic data were analyzed for association with ASCVD in a cohort of 41,650 UK Biobank participants. Selected features for analysis included clinical variables such as a UK-based cardiovascular clinical risk score (QRISK3) and lipid levels, 36 polygenic risk scores (PRSs), and Olink protein expression data of 2,920 proteins. We used least absolute shrinkage and selection operator (LASSO) regression to select features and compared area under the curve (AUC) statistics between data types. Randomized LASSO regression with a stability selection algorithm identified a smaller set of more robustly associated proteins. The benefit of plasma proteins over standard clinical variables, the QRISK3 score, and PRSs was evaluated through the derivation of Δ AUC values. We also assessed the incremental gain in model performance using proteomic datasets with varying numbers of proteins. To identify potential causal proteins for ASCVD, we conducted a two-sample Mendelian randomization (MR) analysis. Result: The mean age of our cohort was 54.3 years, 53.3% were female, and 9.9% developed incident ASCVD over a median follow-up of 6.9 years. A protein-only LASSO model selected 294 proteins and returned an AUC of 0.723 (95% CI 0.708-0.737). A clinical variable and PRS-only LASSO model selected 4 clinical variables and 20 PRSs and achieved an AUC of 0.726 (95% CI 0.712-0.741). The addition of the full proteomic dataset to clinical variables and PRSs resulted in a Δ AUC of 0.010 (95% CI 0.003-0.018). Fifteen proteins selected by a stability selection algorithm offered improvement in ASCVD prediction over the QRISK3 risk score [Δ AUC: 0.013 (95% CI 0.005-0.021)]. Filtered and clustered versions of the full proteomic dataset (consisting of 600-1,500 proteins) performed comparably to the full dataset for ASCVD prediction. Using MR, we identified 12 proteins as potentially causal for ASCVD. Conclusion: A plasma proteomic signature performs well for incident ASCVD prediction but only modestly improves prediction over clinical and genetic factors. Further studies are warranted to better elucidate the clinical utility of this signature in predicting the risk of ASCVD over the standard practice of using the QRISK3 score.

背景：动脉粥样硬化性心血管疾病（ASCVD）的风险分层对一级预防至关重要，但目前的临床风险算法存在变异，有待进一步改进。血浆蛋白质组有望成为未来的诊断和预后工具，准确反映复杂的人体特征和疾病过程。我们评估了血浆蛋白质预测 ASCVD 的能力。方法：我们分析了英国生物库 41,650 名参与者的临床、遗传和高通量血浆蛋白质组数据与 ASCVD 的关联。所选的分析特征包括临床变量（如基于英国的心血管临床风险评分（QRISK3）和血脂水平）、36个多基因风险评分（PRS）以及2,920个蛋白的Olink蛋白表达数据。我们使用最小绝对收缩和选择算子（LASSO）回归来选择特征，并比较了不同数据类型的曲线下面积（AUC）统计量。采用稳定性选择算法的随机 LASSO 回归确定了一组较小的关联性更强的蛋白质。通过得出 Δ AUC 值，评估了血浆蛋白相对于标准临床变量、QRISK3 评分和 PRS 的优势。我们还利用蛋白质数量不同的蛋白质组数据集评估了模型性能的增益。为了确定潜在的 ASCVD 病因蛋白，我们进行了双样本孟德尔随机化 (MR) 分析。分析结果我们队列中的平均年龄为 54.3 岁，53.3% 为女性，9.9% 在中位 6.9 年的随访期间发生了 ASCVD。纯蛋白质 LASSO 模型选择了 294 种蛋白质，其 AUC 为 0.723（95% CI 0.708-0.737）。纯临床变量和 PRS LASSO 模型选择了 4 个临床变量和 20 个 PRS，得出的 AUC 为 0.726（95% CI 0.712-0.741）。将完整的蛋白质组数据集加入临床变量和 PRS 后，Δ AUC 为 0.010（95% CI 0.003-0.018）。与 QRISK3 风险评分相比，通过稳定性选择算法选出的 15 个蛋白质提高了 ASCVD 预测能力[Δ AUC：0.013 (95% CI 0.005-0.021)]。完整蛋白质组数据集（由 600-1,500 个蛋白质组成）的过滤和聚类版本在 ASCVD 预测方面的表现与完整数据集相当。利用MR，我们确定了12种蛋白质可能与ASCVD有因果关系。结论：血浆蛋白质组特征对急性心血管疾病的预测效果很好，但与临床和遗传因素相比，只能适度提高预测效果。为了更好地阐明该特征在预测 ASCVD 风险方面的临床实用性，而不是使用 QRISK3 评分的标准做法，有必要开展进一步的研究。

{"title":"A plasma proteomic signature for atherosclerotic cardiovascular disease risk prediction in the UK Biobank cohort","authors":"Trisha P. Gupte, Zahra Azizi, Pik Fang Kho, Jiayan Zhou, Ming-Li Chen, Daniel J. Panyard, Rodrigo Guarischi-Sousa, Austin T. Hilliard, Disha Sharma, Kathleen Watson, Fahim Abbasi, Shoa L. Clarke, Themistocles L. Assimes","doi":"10.1101/2024.09.13.24313652","DOIUrl":"https://doi.org/10.1101/2024.09.13.24313652","url":null,"abstract":"Background: While risk stratification for atherosclerotic cardiovascular disease (ASCVD) is essential for primary prevention, current clinical risk algorithms demonstrate variability and leave room for further improvement. The plasma proteome holds promise as a future diagnostic and prognostic tool that can accurately reflect complex human traits and disease processes. We assessed the ability of plasma proteins to predict ASCVD. Method: Clinical, genetic, and high-throughput plasma proteomic data were analyzed for association with ASCVD in a cohort of 41,650 UK Biobank participants. Selected features for analysis included clinical variables such as a UK-based cardiovascular clinical risk score (QRISK3) and lipid levels, 36 polygenic risk scores (PRSs), and Olink protein expression data of 2,920 proteins. We used least absolute shrinkage and selection operator (LASSO) regression to select features and compared area under the curve (AUC) statistics between data types. Randomized LASSO regression with a stability selection algorithm identified a smaller set of more robustly associated proteins. The benefit of plasma proteins over standard clinical variables, the QRISK3 score, and PRSs was evaluated through the derivation of Δ AUC values. We also assessed the incremental gain in model performance using proteomic datasets with varying numbers of proteins. To identify potential causal proteins for ASCVD, we conducted a two-sample Mendelian randomization (MR) analysis. Result: The mean age of our cohort was 54.3 years, 53.3% were female, and 9.9% developed incident ASCVD over a median follow-up of 6.9 years. A protein-only LASSO model selected 294 proteins and returned an AUC of 0.723 (95% CI 0.708-0.737). A clinical variable and PRS-only LASSO model selected 4 clinical variables and 20 PRSs and achieved an AUC of 0.726 (95% CI 0.712-0.741). The addition of the full proteomic dataset to clinical variables and PRSs resulted in a Δ AUC of 0.010 (95% CI 0.003-0.018). Fifteen proteins selected by a stability selection algorithm offered improvement in ASCVD prediction over the QRISK3 risk score [Δ AUC: 0.013 (95% CI 0.005-0.021)]. Filtered and clustered versions of the full proteomic dataset (consisting of 600-1,500 proteins) performed comparably to the full dataset for ASCVD prediction. Using MR, we identified 12 proteins as potentially causal for ASCVD. Conclusion: A plasma proteomic signature performs well for incident ASCVD prediction but only modestly improves prediction over clinical and genetic factors. Further studies are warranted to better elucidate the clinical utility of this signature in predicting the risk of ASCVD over the standard practice of using the QRISK3 score.","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142257112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0