Accurate Identification and Mechanistic Evaluation of Pathogenic Missense Variants with Rhapsody-2.

bioRxiv : the preprint server for biology Pub Date : 2025-03-06 DOI:10.1101/2025.02.17.638727

Anupam Banerjee, Anthony Bogetti, Ivet Bahar

{"title":"Accurate Identification and Mechanistic Evaluation of Pathogenic Missense Variants with Rhapsody-2.","authors":"Anupam Banerjee, Anthony Bogetti, Ivet Bahar","doi":"10.1101/2025.02.17.638727","DOIUrl":null,"url":null,"abstract":"Understanding the effects of missense mutations or single amino acid variants (SAVs) on protein function is crucial for elucidating the molecular basis of diseases/disorders and designing rational therapies. We introduce here Rhapsody-2, a machine learning tool for discriminating pathogenic and neutral SAVs, significantly expanding on a precursor limited by the availability of structural data. With the advent of AlphaFold2 as a powerful tool for structure prediction, Rhapsody-2 is trained on a significantly expanded dataset of 117,525 SAVs corresponding to 12,094 human proteins reported in the ClinVar database. Adopting a broad set of descriptors composed of sequence evolutionary, structural, dynamic, and energetics features in the training algorithm, Rhapsody-2 achieved an AUROC of 0.94 in 10-fold cross-validation when all SAVs of a particular test protein (mutant) were excluded from the training set. Benchmarking against a variety of testing datasets demonstrated the high performance of Rhapsody-2. While sequence evolutionary descriptors play a dominant role in pathogenicity prediction, those based on structural dynamics provide a mechanistic interpretation. Notably, residues involved in allosteric communication, and those distinguished by pronounced fluctuations in the high frequency modes of motion or subject to spatial constraints in soft modes usually give rise to pathogenicity when mutated. Overall, Rhapsody-2 provides an efficient and transparent tool for accurately predicting the pathogenicity of SAVs and unraveling the mechanistic basis of the observed behavior, thus advancing our understanding of genotype-to-phenotype relations.","PeriodicalId":519960,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11870481/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv : the preprint server for biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2025.02.17.638727","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Understanding the effects of missense mutations or single amino acid variants (SAVs) on protein function is crucial for elucidating the molecular basis of diseases/disorders and designing rational therapies. We introduce here Rhapsody-2, a machine learning tool for discriminating pathogenic and neutral SAVs, significantly expanding on a precursor limited by the availability of structural data. With the advent of AlphaFold2 as a powerful tool for structure prediction, Rhapsody-2 is trained on a significantly expanded dataset of 117,525 SAVs corresponding to 12,094 human proteins reported in the ClinVar database. Adopting a broad set of descriptors composed of sequence evolutionary, structural, dynamic, and energetics features in the training algorithm, Rhapsody-2 achieved an AUROC of 0.94 in 10-fold cross-validation when all SAVs of a particular test protein (mutant) were excluded from the training set. Benchmarking against a variety of testing datasets demonstrated the high performance of Rhapsody-2. While sequence evolutionary descriptors play a dominant role in pathogenicity prediction, those based on structural dynamics provide a mechanistic interpretation. Notably, residues involved in allosteric communication, and those distinguished by pronounced fluctuations in the high frequency modes of motion or subject to spatial constraints in soft modes usually give rise to pathogenicity when mutated. Overall, Rhapsody-2 provides an efficient and transparent tool for accurately predicting the pathogenicity of SAVs and unraveling the mechanistic basis of the observed behavior, thus advancing our understanding of genotype-to-phenotype relations.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用 Rhapsody-2 对致病性错义变异进行精确鉴定和机制评估。

了解错义突变或单氨基酸变异（sav）对蛋白质功能的影响对于阐明疾病/失调的分子基础和设计合理的治疗方法至关重要。我们在这里介绍Rhapsody-2，这是一种用于区分致病性和中性sav的机器学习工具，在受结构数据可用性限制的前体上进行了显著扩展。随着AlphaFold2作为结构预测的强大工具的出现，Rhapsody-2在ClinVar数据库中报告的117,525个sav对应的12,094个人类蛋白质的显著扩展数据集上进行训练。在训练算法中采用广泛的描述符，包括进化、结构、动态和能量学特征，Rhapsody-2在10倍交叉验证中获得了0.94的AUROC，当同一蛋白质的变体不同时包含在训练集和测试集中时。针对各种测试数据集的基准测试证明了Rhapsody-2的高性能。虽然进化描述符在致病性预测中起主导作用，但结构动力学特征为预测sav的致病性或中性效应提供了机制解释。值得注意的是，参与变构通讯的残基，以及那些在高频运动模式中明显波动或在软模式中受空间限制的残基，在突变时通常会引起致病性。总之，Rhapsody-2为准确预测sav的致病性和揭示观察到的行为的机制基础提供了一个高效透明的工具，从而促进了我们对基因型-表型关系的理解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

bioRxiv : the preprint server for biology

自引率

0.00%

发文量