PMTPred: machine-learning-based prediction of protein methyltransferases using the composition of k-spaced amino acid pairs

IF 3.9 2区 化学 Q2 CHEMISTRY, APPLIED Molecular Diversity Pub Date : 2024-07-21 DOI:10.1007/s11030-024-10937-2
Arvind Kumar Yadav, Pradeep Kumar Gupta, Tiratha Raj Singh
{"title":"PMTPred: machine-learning-based prediction of protein methyltransferases using the composition of k-spaced amino acid pairs","authors":"Arvind Kumar Yadav,&nbsp;Pradeep Kumar Gupta,&nbsp;Tiratha Raj Singh","doi":"10.1007/s11030-024-10937-2","DOIUrl":null,"url":null,"abstract":"<div><p>Protein methyltransferases (PMTs) are a group of enzymes that help catalyze the transfer of a methyl group to its substrates. These enzymes play an important role in epigenetic regulation and can methylate various substrates with DNA, RNA, protein, and small-molecule secondary metabolites. Dysregulation of methyltransferases is implicated in various human cancers. However, in light of the well-recognized significance of PMTs, reliable and efficient identification methods are essential. In the present work, we propose a machine-learning-based method for the identification of PMTs. Various sequence-based features were calculated, and prediction models were trained using various machine-learning algorithms using a tenfold cross-validation technique. After evaluating each model on the dataset, the SVM-based CKSAAP model achieved the highest prediction accuracy with balanced sensitivity and specificity. Also, this SVM model outperformed deep-learning algorithms for the prediction of PMTs. In addition, cross-database validation was performed to ensure the robustness of the model. Feature importance was assessed using shapley additive explanations (SHAP) values, providing insights into the contributions of different features to the model’s predictions. Finally, the SVM-based CKSAAP model was implemented in a standalone tool, PMTPred, due to its consistent performance during independent testing and cross-database evaluation. We believe that PMTPred will be a useful and efficient tool for the identification of PMTs. The PMTPred is freely available for download at https://github.com/ArvindYadav7/PMTPred and http://www.bioinfoindia.org/PMTPred/home.html for research and academic use.</p><h3>Graphical abstract</h3><div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":708,"journal":{"name":"Molecular Diversity","volume":"28 4","pages":"2301 - 2315"},"PeriodicalIF":3.9000,"publicationDate":"2024-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Diversity","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1007/s11030-024-10937-2","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, APPLIED","Score":null,"Total":0}
引用次数: 0

Abstract

Protein methyltransferases (PMTs) are a group of enzymes that help catalyze the transfer of a methyl group to its substrates. These enzymes play an important role in epigenetic regulation and can methylate various substrates with DNA, RNA, protein, and small-molecule secondary metabolites. Dysregulation of methyltransferases is implicated in various human cancers. However, in light of the well-recognized significance of PMTs, reliable and efficient identification methods are essential. In the present work, we propose a machine-learning-based method for the identification of PMTs. Various sequence-based features were calculated, and prediction models were trained using various machine-learning algorithms using a tenfold cross-validation technique. After evaluating each model on the dataset, the SVM-based CKSAAP model achieved the highest prediction accuracy with balanced sensitivity and specificity. Also, this SVM model outperformed deep-learning algorithms for the prediction of PMTs. In addition, cross-database validation was performed to ensure the robustness of the model. Feature importance was assessed using shapley additive explanations (SHAP) values, providing insights into the contributions of different features to the model’s predictions. Finally, the SVM-based CKSAAP model was implemented in a standalone tool, PMTPred, due to its consistent performance during independent testing and cross-database evaluation. We believe that PMTPred will be a useful and efficient tool for the identification of PMTs. The PMTPred is freely available for download at https://github.com/ArvindYadav7/PMTPred and http://www.bioinfoindia.org/PMTPred/home.html for research and academic use.

Graphical abstract

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
PMTPred:基于机器学习的蛋白质甲基转移酶预测,使用 k 距氨基酸对的组成。
蛋白甲基转移酶(PMTs)是一类有助于催化甲基基团向底物转移的酶。这些酶在表观遗传调控中发挥着重要作用,可将 DNA、RNA、蛋白质和小分子次生代谢物等各种底物甲基化。甲基转移酶的失调与多种人类癌症有关。然而,鉴于 PMTs 的重要性已得到公认,可靠而高效的鉴定方法至关重要。在本研究中,我们提出了一种基于机器学习的 PMTs 识别方法。我们计算了各种基于序列的特征,并使用各种机器学习算法和十倍交叉验证技术训练了预测模型。在对数据集上的每个模型进行评估后,基于 SVM 的 CKSAAP 模型的预测准确率最高,灵敏度和特异性达到了平衡。在预测 PMT 方面,该 SVM 模型的表现也优于深度学习算法。此外,还进行了跨数据库验证,以确保模型的稳健性。使用夏普利加法解释(SHAP)值评估了特征的重要性,从而深入了解了不同特征对模型预测的贡献。最后,由于基于 SVM 的 CKSAAP 模型在独立测试和跨数据库评估过程中表现稳定,我们将其应用于独立工具 PMTPred 中。我们相信,PMTPred 将成为识别 PMT 的有用而高效的工具。PMTPred 可在 https://github.com/ArvindYadav7/PMTPred 和 http://www.bioinfoindia.org/PMTPred/home.html 免费下载,供研究和学术使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Molecular Diversity
Molecular Diversity 化学-化学综合
CiteScore
7.30
自引率
7.90%
发文量
219
审稿时长
2.7 months
期刊介绍: Molecular Diversity is a new publication forum for the rapid publication of refereed papers dedicated to describing the development, application and theory of molecular diversity and combinatorial chemistry in basic and applied research and drug discovery. The journal publishes both short and full papers, perspectives, news and reviews dealing with all aspects of the generation of molecular diversity, application of diversity for screening against alternative targets of all types (biological, biophysical, technological), analysis of results obtained and their application in various scientific disciplines/approaches including: combinatorial chemistry and parallel synthesis; small molecule libraries; microwave synthesis; flow synthesis; fluorous synthesis; diversity oriented synthesis (DOS); nanoreactors; click chemistry; multiplex technologies; fragment- and ligand-based design; structure/function/SAR; computational chemistry and molecular design; chemoinformatics; screening techniques and screening interfaces; analytical and purification methods; robotics, automation and miniaturization; targeted libraries; display libraries; peptides and peptoids; proteins; oligonucleotides; carbohydrates; natural diversity; new methods of library formulation and deconvolution; directed evolution, origin of life and recombination; search techniques, landscapes, random chemistry and more;
期刊最新文献
Design, synthesis, and biological evaluation of novel molecules as potent inhibitors of indoleamine 2,3-dioxygenase 1. Late-stage-functionalization of anti-depressant molecule buspirone. Identification and interaction mechanism of novel small molecule antagonists targeting CC chemokine receptor 1/3/5 for treatment of non-small cell lung cancer. Integrated computational approaches for identification of potent pyrazole-based glycogen synthase kinase-3β (GSK-3β) inhibitors: 3D-QSAR, virtual screening, docking, MM/GBSA, EC, MD simulation studies. Transcriptome and interactome-based analyses to unravel crucial proteins and pathways involved in Acinetobacter baumannii pathogenesis.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1