PNL: a software to build polygenic risk scores using a super learner approach based on PairNet, a Convolutional Neural Network.

Ting-Huei Chen, Chia-Jung Lee, Syue-Pu Chen, Shang-Jung Wu, Cathy S J Fann
{"title":"PNL: a software to build polygenic risk scores using a super learner approach based on PairNet, a Convolutional Neural Network.","authors":"Ting-Huei Chen, Chia-Jung Lee, Syue-Pu Chen, Shang-Jung Wu, Cathy S J Fann","doi":"10.1093/bioinformatics/btaf071","DOIUrl":null,"url":null,"abstract":"<p><strong>Summary: </strong>Polygenic risk scores (PRSs) hold promise for early disease diagnosis and personalized treatment, but their overall discriminative power remains limited for many diseases in the general population. As a result, numerous novel PRS modeling techniques have been developed to improve predictive performance, but determining the most effective method for a specific application remains uncertain until tested. Hence, we introduce a novel, versatile tool for building an optimized PRS model by integrating candidate models from multiple existing PRS building methods that use target population data and/or incorporating information from other populations through a trans-ethnic approach. Our tool, PNL is based on PairNet algorithm, a Convolutional Neural Network with low computation complexity through simple paring operation. In the case studies for asthma, type 2 diabetes, and vertigo, the optimal PRS model generated with PNL using only Taiwan biobank (TWB) data achieved Area Under the Curves (AUCs) that matched or improved the best results using other methods individually. Incorporating the UK Biobank data (UKBB) data further improved performance of PNL for asthma and type 2 diabetes. For vertigo, unlike the other diseases, individual method analysis showed that UKBB data alone generally produced lower AUCs compared to TWB data alone. As a result, incorporating UKBB data did not improve AUC with PNL, suggesting that increasing the number of candidate models does not necessarily result in higher AUC values, alleviating concerns about overfitting.</p><p><strong>Availability and implementation: </strong>The python code for PairNet algorithm incorporated in PNL is freely available on: https://github.com/FannLab/pairnet. An archived, citable version is stored on: https://doi.org/10.5281/zenodo.14838227.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11879176/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf071","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Summary: Polygenic risk scores (PRSs) hold promise for early disease diagnosis and personalized treatment, but their overall discriminative power remains limited for many diseases in the general population. As a result, numerous novel PRS modeling techniques have been developed to improve predictive performance, but determining the most effective method for a specific application remains uncertain until tested. Hence, we introduce a novel, versatile tool for building an optimized PRS model by integrating candidate models from multiple existing PRS building methods that use target population data and/or incorporating information from other populations through a trans-ethnic approach. Our tool, PNL is based on PairNet algorithm, a Convolutional Neural Network with low computation complexity through simple paring operation. In the case studies for asthma, type 2 diabetes, and vertigo, the optimal PRS model generated with PNL using only Taiwan biobank (TWB) data achieved Area Under the Curves (AUCs) that matched or improved the best results using other methods individually. Incorporating the UK Biobank data (UKBB) data further improved performance of PNL for asthma and type 2 diabetes. For vertigo, unlike the other diseases, individual method analysis showed that UKBB data alone generally produced lower AUCs compared to TWB data alone. As a result, incorporating UKBB data did not improve AUC with PNL, suggesting that increasing the number of candidate models does not necessarily result in higher AUC values, alleviating concerns about overfitting.

Availability and implementation: The python code for PairNet algorithm incorporated in PNL is freely available on: https://github.com/FannLab/pairnet. An archived, citable version is stored on: https://doi.org/10.5281/zenodo.14838227.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
PNL:一款使用基于PairNet(一种卷积神经网络)的超级学习者方法构建多基因风险评分的软件。
摘要:多基因风险评分(PRS)有望用于疾病早期诊断和个性化治疗,但其对普通人群中许多疾病的总体判别能力仍然有限。因此,已经开发了许多新的PRS建模技术来提高预测性能,但是确定特定应用程序的最有效方法仍然不确定,直到经过测试。因此,我们引入了一种新颖的、通用的工具,通过整合来自多种现有PRS构建方法的候选模型来构建优化的PRS模型,这些方法使用目标人群数据和/或通过跨种族方法合并来自其他人群的信息。我们的工具PNL基于PairNet算法,这是一种卷积神经网络,通过简单的配对操作,计算复杂度很低。在哮喘、2型糖尿病和眩晕的病例研究中,仅使用TWB数据的PNL生成的最佳PRS模型获得的auc与单独使用其他方法获得的最佳结果相匹配或改进。合并UKBB数据进一步提高了PNL治疗哮喘和2型糖尿病的疗效。对于眩晕,与其他疾病不同,个体方法分析表明,单独使用UKBB数据通常比单独使用TWB数据产生更低的auc。因此,合并UKBB数据并没有改善PNL的AUC,这表明增加候选模型的数量并不一定会导致更高的AUC值,从而减轻了对过拟合的担忧。可用性和实现:PNL中包含的PairNet算法的python代码可以在:https://github.com/FannLab/pairnet上免费获得。存档的,可引用的版本存储在:https://doi.org/10.5281/zenodo.14838227.Contact:信件应发送给相应的作者。补充资料:详细的实施程序见补充资料。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Secure Bioinformatics: Privacy-preserving Federated Analytics using Homomorphic Encryption. PETScan: Score-Based Genome- Wide Association Analysis of RNA-Seq and ATAC-Seq Data. Tensor-cell2cell v2 unravels coordinated dynamics of protein- and metabolite-mediated cell-cell communication. A Deep Learning Framework for Comprehensive Prediction of Human RNA G-Quadruplex-Binding Proteins. Nallo: a Nextflow pipeline for comprehensive human long-read genome analysis.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1