DYNA: Disease-Specific Language Model for Variant Pathogenicity

arXiv - QuanBio - Genomics Pub Date : 2024-05-31 DOI:arxiv-2406.00164

Huixin Zhan, Zijun Zhang

{"title":"DYNA: Disease-Specific Language Model for Variant Pathogenicity","authors":"Huixin Zhan, Zijun Zhang","doi":"arxiv-2406.00164","DOIUrl":null,"url":null,"abstract":"Clinical variant classification of pathogenic versus benign genetic variants\nremains a challenge in clinical genetics. Recently, the proposition of genomic\nfoundation models has improved the generic variant effect prediction (VEP)\naccuracy via weakly-supervised or unsupervised training. However, these VEPs\nare not disease-specific, limiting their adaptation at the point of care. To\naddress this problem, we propose DYNA: Disease-specificity fine-tuning via a\nSiamese neural network broadly applicable to all genomic foundation models for\nmore effective variant effect predictions in disease-specific contexts. We\nevaluate DYNA in two distinct disease-relevant tasks. For coding VEPs, we focus\non various cardiovascular diseases, where gene-disease relationships of\nloss-of-function vs. gain-of-function dictate disease-specific VEP. For\nnon-coding VEPs, we apply DYNA to an essential post-transcriptional regulatory\naxis of RNA splicing, the most common non-coding pathogenic mechanism in\nestablished clinical VEP guidelines. In both cases, DYNA fine-tunes various\npre-trained genomic foundation models on small, rare variant sets. The DYNA\nfine-tuned models show superior performance in the held-out rare variant\ntesting set and are further replicated in large, clinically-relevant variant\nannotations in ClinVAR. Thus, DYNA offers a potent disease-specific variant\neffect prediction method, excelling in intra-gene generalization and\ngeneralization to unseen genetic variants, making it particularly valuable for\ndisease associations and clinical applicability.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"30 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.00164","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Clinical variant classification of pathogenic versus benign genetic variants remains a challenge in clinical genetics. Recently, the proposition of genomic foundation models has improved the generic variant effect prediction (VEP) accuracy via weakly-supervised or unsupervised training. However, these VEPs are not disease-specific, limiting their adaptation at the point of care. To address this problem, we propose DYNA: Disease-specificity fine-tuning via a Siamese neural network broadly applicable to all genomic foundation models for more effective variant effect predictions in disease-specific contexts. We evaluate DYNA in two distinct disease-relevant tasks. For coding VEPs, we focus on various cardiovascular diseases, where gene-disease relationships of loss-of-function vs. gain-of-function dictate disease-specific VEP. For non-coding VEPs, we apply DYNA to an essential post-transcriptional regulatory axis of RNA splicing, the most common non-coding pathogenic mechanism in established clinical VEP guidelines. In both cases, DYNA fine-tunes various pre-trained genomic foundation models on small, rare variant sets. The DYNA fine-tuned models show superior performance in the held-out rare variant testing set and are further replicated in large, clinically-relevant variant annotations in ClinVAR. Thus, DYNA offers a potent disease-specific variant effect prediction method, excelling in intra-gene generalization and generalization to unseen genetic variants, making it particularly valuable for disease associations and clinical applicability.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

DYNA：变异致病性疾病特异性语言模型

对致病与良性遗传变异进行临床变异分类仍是临床遗传学的一项挑战。最近，基因组基础模型的提出通过弱监督或无监督训练提高了通用变异效应预测（VEP）的准确性。然而，这些变异效应预测模型并非针对特定疾病，这限制了它们在医疗点的适应性。为了解决这个问题，我们提出了 DYNA：通过暹罗神经网络进行疾病特异性微调，它广泛适用于所有基因组基础模型，能在疾病特异性背景下更有效地预测变异效应。我们在两个不同的疾病相关任务中对 DYNA 进行了评估。对于编码 VEP，我们关注各种心血管疾病，其中功能缺失与功能增益的基因-疾病关系决定了特定疾病的 VEP。对于非编码 VEP，我们将 DYNA 应用于 RNA 剪接这一重要的转录后调控轴，这是临床 VEP 指南中最常见的非编码致病机制。在这两种情况下，DYNA 都会在小型、罕见的变异集上对各种预训练基因组基础模型进行微调。经过 DYNA 微调的模型在保留的罕见变异测试集中表现出了卓越的性能，并在 ClinVAR 中的大型临床相关变异注释中得到了进一步复制。因此，DYNA 提供了一种有效的疾病特异性变异效应预测方法，在基因内泛化和泛化到未见过的基因变异方面表现出色，使其在疾病关联和临床应用方面特别有价值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - QuanBio - Genomics

自引率

0.00%

发文量