Accurate Identification and Mechanistic Evaluation of Pathogenic Missense Variants with Rhapsody-2.

Anupam Banerjee, Anthony Bogetti, Ivet Bahar
{"title":"Accurate Identification and Mechanistic Evaluation of Pathogenic Missense Variants with <i>Rhapsody-2</i>.","authors":"Anupam Banerjee, Anthony Bogetti, Ivet Bahar","doi":"10.1101/2025.02.17.638727","DOIUrl":null,"url":null,"abstract":"<p><p>Understanding the effects of missense mutations or single amino acid variants (SAVs) on protein function is crucial for elucidating the molecular basis of diseases/disorders and designing rational therapies. We introduce here Rhapsody-2, a machine learning tool for discriminating pathogenic and neutral SAVs, significantly expanding on a precursor limited by the availability of structural data. With the advent of AlphaFold2 as a powerful tool for structure prediction, Rhapsody-2 is trained on a significantly expanded dataset of 117,525 SAVs corresponding to 12,094 human proteins reported in the ClinVar database. Adopting a broad set of descriptors composed of sequence evolutionary, structural, dynamic, and energetics features in the training algorithm, Rhapsody-2 achieved an AUROC of 0.94 in 10-fold cross-validation when all SAVs of a particular test protein (mutant) were excluded from the training set. Benchmarking against a variety of testing datasets demonstrated the high performance of Rhapsody-2. While sequence evolutionary descriptors play a dominant role in pathogenicity prediction, those based on structural dynamics provide a mechanistic interpretation. Notably, residues involved in allosteric communication, and those distinguished by pronounced fluctuations in the high frequency modes of motion or subject to spatial constraints in soft modes usually give rise to pathogenicity when mutated. Overall, Rhapsody-2 provides an efficient and transparent tool for accurately predicting the pathogenicity of SAVs and unraveling the mechanistic basis of the observed behavior, thus advancing our understanding of genotype-to-phenotype relations.</p>","PeriodicalId":519960,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11870481/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv : the preprint server for biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2025.02.17.638727","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Understanding the effects of missense mutations or single amino acid variants (SAVs) on protein function is crucial for elucidating the molecular basis of diseases/disorders and designing rational therapies. We introduce here Rhapsody-2, a machine learning tool for discriminating pathogenic and neutral SAVs, significantly expanding on a precursor limited by the availability of structural data. With the advent of AlphaFold2 as a powerful tool for structure prediction, Rhapsody-2 is trained on a significantly expanded dataset of 117,525 SAVs corresponding to 12,094 human proteins reported in the ClinVar database. Adopting a broad set of descriptors composed of sequence evolutionary, structural, dynamic, and energetics features in the training algorithm, Rhapsody-2 achieved an AUROC of 0.94 in 10-fold cross-validation when all SAVs of a particular test protein (mutant) were excluded from the training set. Benchmarking against a variety of testing datasets demonstrated the high performance of Rhapsody-2. While sequence evolutionary descriptors play a dominant role in pathogenicity prediction, those based on structural dynamics provide a mechanistic interpretation. Notably, residues involved in allosteric communication, and those distinguished by pronounced fluctuations in the high frequency modes of motion or subject to spatial constraints in soft modes usually give rise to pathogenicity when mutated. Overall, Rhapsody-2 provides an efficient and transparent tool for accurately predicting the pathogenicity of SAVs and unraveling the mechanistic basis of the observed behavior, thus advancing our understanding of genotype-to-phenotype relations.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用 Rhapsody-2 对致病性错义变异进行精确鉴定和机制评估。
了解错义突变或单氨基酸变异(sav)对蛋白质功能的影响对于阐明疾病/失调的分子基础和设计合理的治疗方法至关重要。我们在这里介绍Rhapsody-2,这是一种用于区分致病性和中性sav的机器学习工具,在受结构数据可用性限制的前体上进行了显著扩展。随着AlphaFold2作为结构预测的强大工具的出现,Rhapsody-2在ClinVar数据库中报告的117,525个sav对应的12,094个人类蛋白质的显著扩展数据集上进行训练。在训练算法中采用广泛的描述符,包括进化、结构、动态和能量学特征,Rhapsody-2在10倍交叉验证中获得了0.94的AUROC,当同一蛋白质的变体不同时包含在训练集和测试集中时。针对各种测试数据集的基准测试证明了Rhapsody-2的高性能。虽然进化描述符在致病性预测中起主导作用,但结构动力学特征为预测sav的致病性或中性效应提供了机制解释。值得注意的是,参与变构通讯的残基,以及那些在高频运动模式中明显波动或在软模式中受空间限制的残基,在突变时通常会引起致病性。总之,Rhapsody-2为准确预测sav的致病性和揭示观察到的行为的机制基础提供了一个高效透明的工具,从而促进了我们对基因型-表型关系的理解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Structural basis of caveolin-driven membrane bending. Cellular coding of ingestion in the caudal brainstem. Depth-Sensitive Optical Property Characterization Using Multi-Frequency Laparoscopic SFDI. DiCoLo: Integration-free and cluster-free detection of localized differential gene co-expression in single-cell data. Comparing Multislice Projections of MD Simulations with CryoEM Exposes Structural Prediction Errors.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1