Extensive co-regulation of neighbouring genes complicates the use of eQTLs in target gene prioritisation.

IF 3.3 Q2 GENETICS & HEREDITY HGG Advances Pub Date : 2024-08-28 DOI:10.1016/j.xhgg.2024.100348
Ralf Tambets, Anastassia Kolde, Peep Kolberg, Michael I Love, Kaur Alasoo
{"title":"Extensive co-regulation of neighbouring genes complicates the use of eQTLs in target gene prioritisation.","authors":"Ralf Tambets, Anastassia Kolde, Peep Kolberg, Michael I Love, Kaur Alasoo","doi":"10.1016/j.xhgg.2024.100348","DOIUrl":null,"url":null,"abstract":"<p><p>Identifying causal genes underlying genome-wide association studies (GWAS) is a fundamental problem in human genetics. Although colocalisation with gene expression quantitative trait loci (eQTLs) is often used to prioritise GWAS target genes, systematic benchmarking has been limited due to unavailability of large ground truth datasets. Here, we re-analysed plasma protein QTL data from 3,301 individuals of the INTERVAL cohort together with 131 eQTL Catalogue datasets. Focusing on variants located within or close to the affected protein identified 793 proteins with at least one cis-pQTL where we could assume that the most likely causal gene was the gene coding for the protein. We then benchmarked the ability of cis-eQTLs to recover these causal genes by comparing three Bayesian colocalisation methods (coloc.susie, coloc.abf and CLPP) and five Mendelian randomisation (MR) approaches (three varieties of inverse-variance weighted MR, MR-RAPS, and MRLocus). We found that assigning fine-mapped pQTLs to their closest protein coding genes outperformed all colocalisation methods regarding both precision (71.9%) and recall (76.9%). Furthermore, the colocalisation method with the highest recall (coloc.susie - 46.3%) also had the lowest precision (45.1%). Combining evidence from multiple conditionally distinct colocalising QTLs with MR increased precision to 81%, but this was accompanied by a large reduction in recall to 7.1%. Furthermore, the choice of the MR method greatly affected performance, with the standard inverse-variance weighted MR often producing many false positives. Our results highlight that linking GWAS variants to target genes remains challenging with eQTL evidence alone, and prioritising novel targets requires triangulation of evidence from multiple sources.</p>","PeriodicalId":34530,"journal":{"name":"HGG Advances","volume":null,"pages":null},"PeriodicalIF":3.3000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"HGG Advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.xhgg.2024.100348","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Identifying causal genes underlying genome-wide association studies (GWAS) is a fundamental problem in human genetics. Although colocalisation with gene expression quantitative trait loci (eQTLs) is often used to prioritise GWAS target genes, systematic benchmarking has been limited due to unavailability of large ground truth datasets. Here, we re-analysed plasma protein QTL data from 3,301 individuals of the INTERVAL cohort together with 131 eQTL Catalogue datasets. Focusing on variants located within or close to the affected protein identified 793 proteins with at least one cis-pQTL where we could assume that the most likely causal gene was the gene coding for the protein. We then benchmarked the ability of cis-eQTLs to recover these causal genes by comparing three Bayesian colocalisation methods (coloc.susie, coloc.abf and CLPP) and five Mendelian randomisation (MR) approaches (three varieties of inverse-variance weighted MR, MR-RAPS, and MRLocus). We found that assigning fine-mapped pQTLs to their closest protein coding genes outperformed all colocalisation methods regarding both precision (71.9%) and recall (76.9%). Furthermore, the colocalisation method with the highest recall (coloc.susie - 46.3%) also had the lowest precision (45.1%). Combining evidence from multiple conditionally distinct colocalising QTLs with MR increased precision to 81%, but this was accompanied by a large reduction in recall to 7.1%. Furthermore, the choice of the MR method greatly affected performance, with the standard inverse-variance weighted MR often producing many false positives. Our results highlight that linking GWAS variants to target genes remains challenging with eQTL evidence alone, and prioritising novel targets requires triangulation of evidence from multiple sources.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
相邻基因的广泛共调控使 eQTLs 在目标基因优先排序中的使用变得更加复杂。
确定全基因组关联研究(GWAS)的因果基因是人类遗传学的一个基本问题。虽然与基因表达定量性状位点(eQTLs)的共定位经常被用来确定 GWAS 目标基因的优先次序,但由于无法获得大型地面实况数据集,系统性的基准测试一直受到限制。在这里,我们重新分析了 INTERVAL 队列中 3301 个个体的血浆蛋白 QTL 数据以及 131 个 eQTL 目录数据集。我们重点研究了位于受影响蛋白质内部或靠近受影响蛋白质的变异,发现了至少有一个顺式 QTL 的 793 个蛋白质,在这些蛋白质中,我们可以假定最有可能的致病基因是该蛋白质的编码基因。然后,我们通过比较三种贝叶斯共定位方法(coloc.susie、coloc.abf 和 CLPP)和五种孟德尔随机化(MR)方法(三种反方差加权 MR、MR-RAPS 和 MRLocus),对顺式-eQTL 恢复这些因果基因的能力进行了基准测试。我们发现,将精细映射的 pQTL 分配到与其最接近的蛋白质编码基因上,其精确度(71.9%)和召回率(76.9%)均优于所有共定位方法。此外,召回率最高的共定位方法(coloc.susie - 46.3%)的精确度也最低(45.1%)。将来自多个条件不同的共定位 QTL 的证据与 MR 结合起来,可将精确度提高到 81%,但同时召回率也大幅降低到 7.1%。此外,MR 方法的选择也在很大程度上影响了结果,标准的反方差加权 MR 经常会产生很多假阳性。我们的研究结果突出表明,仅凭 eQTL 证据将 GWAS 变异与目标基因联系起来仍然具有挑战性,要确定新目标基因的优先次序,需要对多种来源的证据进行三角测量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
HGG Advances
HGG Advances Biochemistry, Genetics and Molecular Biology-Molecular Medicine
CiteScore
4.30
自引率
4.50%
发文量
69
审稿时长
14 weeks
期刊最新文献
Joint genotype and ancestry analysis identify novel loci associated with atopic dermatitis in African American population. Investigation of cryptic JAG1 splice variants as a cause of Alagille syndrome and performance evaluation of splice predictor tools. Dominantly acting variants in vacuolar ATPase subunits impair lysosomal/autophagolysosome function causing a multisystemic disorder with neurocognitive impairment and multiple congenital anomalies. Extensive co-regulation of neighbouring genes complicates the use of eQTLs in target gene prioritisation. Enhancing Personalized Gene Expression Prediction From DNA Sequences Using Genomic Foundation Models.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1