GPTrans: A Biological Language Model-Based Approach for Predicting Disease-Associated Mutations in G Protein-Coupled Receptors.

IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL Journal of Chemical Information and Modeling Pub Date : 2024-12-23 Epub Date: 2024-11-28 DOI:10.1021/acs.jcim.4c01999
Xiaohua Wang, Ming Zhang, Xibei Yang, Dong-Jun Yu, Fang Ge
{"title":"GPTrans: A Biological Language Model-Based Approach for Predicting Disease-Associated Mutations in G Protein-Coupled Receptors.","authors":"Xiaohua Wang, Ming Zhang, Xibei Yang, Dong-Jun Yu, Fang Ge","doi":"10.1021/acs.jcim.4c01999","DOIUrl":null,"url":null,"abstract":"<p><p>Accurately predicting mutations in G protein-coupled receptors (GPCRs) is critical for advancing disease diagnosis and drug discovery. In response to this imperative, GPTrans has emerged as a highly accurate predictor of disease-related mutations in GPCRs. The core innovation of GPTrans resides in the design of a novel feature extraction network, that is capable of integrating features from both wildtype and mutant protein variant sites, utilizing multifeature connections within a transformer framework to ensure comprehensive feature extraction. A key aspect of GPTrans's effectiveness is our introduction of an innovative deep feature integration strategy, which merges embeddings and class tokens from multiple protein language models, including evolutionary scale modeling and ProtTrans, thus shedding light on the biochemical properties of proteins. Leveraging transformer components and a self-attention mechanism, GPTrans captures higher-level representations of protein features. Employing both wildtype and mutation site information for feature fusion not only enriches the predictive feature set but also avoids the common issue of overestimation associated with sequence-based predictions. This approach distinguishes GPTrans, enabling it to significantly outperform existing methods. Our evaluations across diverse GPCR data sets, including ClinVar and MutHTP, demonstrate GPTrans's superior performance, with average AUC values of 0.874 and 0.590 in 10-fold cross-validation. Notably, compared to the AlphaMissense method, GPTrans exhibited a remarkable 38.03% improvement in accuracy when predicting disease-associated mutations in the MutHTP data set. A thorough analysis of the predicted results further validates the model's effectiveness. The source code, data sets, and prediction results for GPTrans are available for academic use at https://github.com/EduardWang/GPTrans.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"9626-9642"},"PeriodicalIF":5.3000,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jcim.4c01999","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/28 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0

Abstract

Accurately predicting mutations in G protein-coupled receptors (GPCRs) is critical for advancing disease diagnosis and drug discovery. In response to this imperative, GPTrans has emerged as a highly accurate predictor of disease-related mutations in GPCRs. The core innovation of GPTrans resides in the design of a novel feature extraction network, that is capable of integrating features from both wildtype and mutant protein variant sites, utilizing multifeature connections within a transformer framework to ensure comprehensive feature extraction. A key aspect of GPTrans's effectiveness is our introduction of an innovative deep feature integration strategy, which merges embeddings and class tokens from multiple protein language models, including evolutionary scale modeling and ProtTrans, thus shedding light on the biochemical properties of proteins. Leveraging transformer components and a self-attention mechanism, GPTrans captures higher-level representations of protein features. Employing both wildtype and mutation site information for feature fusion not only enriches the predictive feature set but also avoids the common issue of overestimation associated with sequence-based predictions. This approach distinguishes GPTrans, enabling it to significantly outperform existing methods. Our evaluations across diverse GPCR data sets, including ClinVar and MutHTP, demonstrate GPTrans's superior performance, with average AUC values of 0.874 and 0.590 in 10-fold cross-validation. Notably, compared to the AlphaMissense method, GPTrans exhibited a remarkable 38.03% improvement in accuracy when predicting disease-associated mutations in the MutHTP data set. A thorough analysis of the predicted results further validates the model's effectiveness. The source code, data sets, and prediction results for GPTrans are available for academic use at https://github.com/EduardWang/GPTrans.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
GPTrans:基于生物语言模型预测G蛋白偶联受体疾病相关突变的方法
准确预测G蛋白偶联受体(gpcr)的突变对于推进疾病诊断和药物发现至关重要。为了响应这一需求,GPTrans已经成为gpcr中疾病相关突变的高度准确预测因子。GPTrans的核心创新在于设计了一种新的特征提取网络,该网络能够整合野生型和突变型蛋白质变异位点的特征,利用变压器框架内的多特征连接来确保全面的特征提取。GPTrans有效性的一个关键方面是我们引入了一种创新的深度特征集成策略,该策略合并了来自多种蛋白质语言模型的嵌入和类标记,包括进化尺度建模和ProtTrans,从而揭示了蛋白质的生化特性。利用变压器组件和自关注机制,GPTrans捕获蛋白质特征的高级表示。利用野生型和突变位点信息进行特征融合不仅丰富了预测特征集,而且避免了基于序列的预测中常见的高估问题。这种方法与GPTrans不同,使其显著优于现有方法。我们对不同GPCR数据集(包括ClinVar和MutHTP)的评估表明,GPTrans具有优越的性能,10倍交叉验证的平均AUC值为0.874和0.590。值得注意的是,与AlphaMissense方法相比,GPTrans在预测MutHTP数据集中疾病相关突变时的准确性提高了38.03%。对预测结果的深入分析进一步验证了模型的有效性。GPTrans的源代码、数据集和预测结果可在https://github.com/EduardWang/GPTrans上用于学术用途。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
9.80
自引率
10.70%
发文量
529
审稿时长
1.4 months
期刊介绍: The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.
期刊最新文献
Machine Learning Prediction for Fe(II) Spin-Crossover Complex in the Same Spin State Using Geometrical and Topological Descriptors. Identification of Amyloid Regions and Mechanisms from Sequence-Based Modeling and Molecular Dynamics Simulation: A Case Study of the Intrinsically Disordered Protein DPF3. Mechanism of c-Cbl Transition from Autoinhibited to Partially Open State via Substrate Binding. Expanding and Enhancing Neural Network-Based QM-AI Models for the Accurate Prediction of Halogen-π Interaction Energies in Protein Contexts. A Multimodal Sequence-to-Sequence Model for Automatic Assignment of ATC Codes in Drug Discovery and Repurposing.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1