DYNA: Disease-Specific Language Model for Variant Pathogenicity

Huixin Zhan, Zijun Zhang
{"title":"DYNA: Disease-Specific Language Model for Variant Pathogenicity","authors":"Huixin Zhan, Zijun Zhang","doi":"arxiv-2406.00164","DOIUrl":null,"url":null,"abstract":"Clinical variant classification of pathogenic versus benign genetic variants\nremains a challenge in clinical genetics. Recently, the proposition of genomic\nfoundation models has improved the generic variant effect prediction (VEP)\naccuracy via weakly-supervised or unsupervised training. However, these VEPs\nare not disease-specific, limiting their adaptation at the point of care. To\naddress this problem, we propose DYNA: Disease-specificity fine-tuning via a\nSiamese neural network broadly applicable to all genomic foundation models for\nmore effective variant effect predictions in disease-specific contexts. We\nevaluate DYNA in two distinct disease-relevant tasks. For coding VEPs, we focus\non various cardiovascular diseases, where gene-disease relationships of\nloss-of-function vs. gain-of-function dictate disease-specific VEP. For\nnon-coding VEPs, we apply DYNA to an essential post-transcriptional regulatory\naxis of RNA splicing, the most common non-coding pathogenic mechanism in\nestablished clinical VEP guidelines. In both cases, DYNA fine-tunes various\npre-trained genomic foundation models on small, rare variant sets. The DYNA\nfine-tuned models show superior performance in the held-out rare variant\ntesting set and are further replicated in large, clinically-relevant variant\nannotations in ClinVAR. Thus, DYNA offers a potent disease-specific variant\neffect prediction method, excelling in intra-gene generalization and\ngeneralization to unseen genetic variants, making it particularly valuable for\ndisease associations and clinical applicability.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.00164","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Clinical variant classification of pathogenic versus benign genetic variants remains a challenge in clinical genetics. Recently, the proposition of genomic foundation models has improved the generic variant effect prediction (VEP) accuracy via weakly-supervised or unsupervised training. However, these VEPs are not disease-specific, limiting their adaptation at the point of care. To address this problem, we propose DYNA: Disease-specificity fine-tuning via a Siamese neural network broadly applicable to all genomic foundation models for more effective variant effect predictions in disease-specific contexts. We evaluate DYNA in two distinct disease-relevant tasks. For coding VEPs, we focus on various cardiovascular diseases, where gene-disease relationships of loss-of-function vs. gain-of-function dictate disease-specific VEP. For non-coding VEPs, we apply DYNA to an essential post-transcriptional regulatory axis of RNA splicing, the most common non-coding pathogenic mechanism in established clinical VEP guidelines. In both cases, DYNA fine-tunes various pre-trained genomic foundation models on small, rare variant sets. The DYNA fine-tuned models show superior performance in the held-out rare variant testing set and are further replicated in large, clinically-relevant variant annotations in ClinVAR. Thus, DYNA offers a potent disease-specific variant effect prediction method, excelling in intra-gene generalization and generalization to unseen genetic variants, making it particularly valuable for disease associations and clinical applicability.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
DYNA:变异致病性疾病特异性语言模型
对致病与良性遗传变异进行临床变异分类仍是临床遗传学的一项挑战。最近,基因组基础模型的提出通过弱监督或无监督训练提高了通用变异效应预测(VEP)的准确性。然而,这些变异效应预测模型并非针对特定疾病,这限制了它们在医疗点的适应性。为了解决这个问题,我们提出了 DYNA:通过暹罗神经网络进行疾病特异性微调,它广泛适用于所有基因组基础模型,能在疾病特异性背景下更有效地预测变异效应。我们在两个不同的疾病相关任务中对 DYNA 进行了评估。对于编码 VEP,我们关注各种心血管疾病,其中功能缺失与功能增益的基因-疾病关系决定了特定疾病的 VEP。对于非编码 VEP,我们将 DYNA 应用于 RNA 剪接这一重要的转录后调控轴,这是临床 VEP 指南中最常见的非编码致病机制。在这两种情况下,DYNA 都会在小型、罕见的变异集上对各种预训练基因组基础模型进行微调。经过 DYNA 微调的模型在保留的罕见变异测试集中表现出了卓越的性能,并在 ClinVAR 中的大型临床相关变异注释中得到了进一步复制。因此,DYNA 提供了一种有效的疾病特异性变异效应预测方法,在基因内泛化和泛化到未见过的基因变异方面表现出色,使其在疾病关联和临床应用方面特别有价值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Allium Vegetables Intake and Digestive System Cancer Risk: A Study Based on Mendelian Randomization, Network Pharmacology and Molecular Docking wgatools: an ultrafast toolkit for manipulating whole genome alignments Selecting Differential Splicing Methods: Practical Considerations Advancements in colored k-mer sets: essentials for the curious Advancements in practical k-mer sets: essentials for the curious
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1