{"title":"DYNA:变异致病性疾病特异性语言模型","authors":"Huixin Zhan, Zijun Zhang","doi":"arxiv-2406.00164","DOIUrl":null,"url":null,"abstract":"Clinical variant classification of pathogenic versus benign genetic variants\nremains a challenge in clinical genetics. Recently, the proposition of genomic\nfoundation models has improved the generic variant effect prediction (VEP)\naccuracy via weakly-supervised or unsupervised training. However, these VEPs\nare not disease-specific, limiting their adaptation at the point of care. To\naddress this problem, we propose DYNA: Disease-specificity fine-tuning via a\nSiamese neural network broadly applicable to all genomic foundation models for\nmore effective variant effect predictions in disease-specific contexts. We\nevaluate DYNA in two distinct disease-relevant tasks. For coding VEPs, we focus\non various cardiovascular diseases, where gene-disease relationships of\nloss-of-function vs. gain-of-function dictate disease-specific VEP. For\nnon-coding VEPs, we apply DYNA to an essential post-transcriptional regulatory\naxis of RNA splicing, the most common non-coding pathogenic mechanism in\nestablished clinical VEP guidelines. In both cases, DYNA fine-tunes various\npre-trained genomic foundation models on small, rare variant sets. The DYNA\nfine-tuned models show superior performance in the held-out rare variant\ntesting set and are further replicated in large, clinically-relevant variant\nannotations in ClinVAR. Thus, DYNA offers a potent disease-specific variant\neffect prediction method, excelling in intra-gene generalization and\ngeneralization to unseen genetic variants, making it particularly valuable for\ndisease associations and clinical applicability.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"30 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DYNA: Disease-Specific Language Model for Variant Pathogenicity\",\"authors\":\"Huixin Zhan, Zijun Zhang\",\"doi\":\"arxiv-2406.00164\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clinical variant classification of pathogenic versus benign genetic variants\\nremains a challenge in clinical genetics. Recently, the proposition of genomic\\nfoundation models has improved the generic variant effect prediction (VEP)\\naccuracy via weakly-supervised or unsupervised training. However, these VEPs\\nare not disease-specific, limiting their adaptation at the point of care. To\\naddress this problem, we propose DYNA: Disease-specificity fine-tuning via a\\nSiamese neural network broadly applicable to all genomic foundation models for\\nmore effective variant effect predictions in disease-specific contexts. We\\nevaluate DYNA in two distinct disease-relevant tasks. For coding VEPs, we focus\\non various cardiovascular diseases, where gene-disease relationships of\\nloss-of-function vs. gain-of-function dictate disease-specific VEP. For\\nnon-coding VEPs, we apply DYNA to an essential post-transcriptional regulatory\\naxis of RNA splicing, the most common non-coding pathogenic mechanism in\\nestablished clinical VEP guidelines. In both cases, DYNA fine-tunes various\\npre-trained genomic foundation models on small, rare variant sets. The DYNA\\nfine-tuned models show superior performance in the held-out rare variant\\ntesting set and are further replicated in large, clinically-relevant variant\\nannotations in ClinVAR. Thus, DYNA offers a potent disease-specific variant\\neffect prediction method, excelling in intra-gene generalization and\\ngeneralization to unseen genetic variants, making it particularly valuable for\\ndisease associations and clinical applicability.\",\"PeriodicalId\":501070,\"journal\":{\"name\":\"arXiv - QuanBio - Genomics\",\"volume\":\"30 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuanBio - Genomics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2406.00164\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.00164","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
DYNA: Disease-Specific Language Model for Variant Pathogenicity
Clinical variant classification of pathogenic versus benign genetic variants
remains a challenge in clinical genetics. Recently, the proposition of genomic
foundation models has improved the generic variant effect prediction (VEP)
accuracy via weakly-supervised or unsupervised training. However, these VEPs
are not disease-specific, limiting their adaptation at the point of care. To
address this problem, we propose DYNA: Disease-specificity fine-tuning via a
Siamese neural network broadly applicable to all genomic foundation models for
more effective variant effect predictions in disease-specific contexts. We
evaluate DYNA in two distinct disease-relevant tasks. For coding VEPs, we focus
on various cardiovascular diseases, where gene-disease relationships of
loss-of-function vs. gain-of-function dictate disease-specific VEP. For
non-coding VEPs, we apply DYNA to an essential post-transcriptional regulatory
axis of RNA splicing, the most common non-coding pathogenic mechanism in
established clinical VEP guidelines. In both cases, DYNA fine-tunes various
pre-trained genomic foundation models on small, rare variant sets. The DYNA
fine-tuned models show superior performance in the held-out rare variant
testing set and are further replicated in large, clinically-relevant variant
annotations in ClinVAR. Thus, DYNA offers a potent disease-specific variant
effect prediction method, excelling in intra-gene generalization and
generalization to unseen genetic variants, making it particularly valuable for
disease associations and clinical applicability.