Improving protein-protein interaction prediction using protein language model and protein network features

IF 2.6 4区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Analytical biochemistry Pub Date : 2024-04-26 DOI:10.1016/j.ab.2024.115550
Jun Hu , Zhe Li , Bing Rao , Maha A. Thafar , Muhammad Arif
{"title":"Improving protein-protein interaction prediction using protein language model and protein network features","authors":"Jun Hu ,&nbsp;Zhe Li ,&nbsp;Bing Rao ,&nbsp;Maha A. Thafar ,&nbsp;Muhammad Arif","doi":"10.1016/j.ab.2024.115550","DOIUrl":null,"url":null,"abstract":"<div><p>Interactions between proteins are ubiquitous in a wide variety of biological processes. Accurately identifying the protein-protein interaction (PPI) is of significant importance for understanding the mechanisms of protein functions and facilitating drug discovery. Although the wet-lab technological methods are the best way to identify PPI, their major constraints are their time-consuming nature, high cost, and labor-intensiveness. Hence, lots of efforts have been made towards developing computational methods to improve the performance of PPI prediction. In this study, we propose a novel hybrid computational method (called KSGPPI) that aims at improving the prediction performance of PPI via extracting the discriminative information from protein sequences and interaction networks. The KSGPPI model comprises two feature extraction modules. In the first feature extraction module, a large protein language model, ESM-2, is employed to exploit the global complex patterns concealed within protein sequences. Subsequently, feature representations are further extracted through CKSAAP, and a two-dimensional convolutional neural network (CNN) is utilized to capture local information. In the second feature extraction module, the query protein acquires its similar protein from the STRING database via the sequence alignment tool NW-align and then captures the graph embedding feature for the query protein in the protein interaction network of the similar protein using the algorithm of Node2vec. Finally, the features of these two feature extraction modules are efficiently fused; the fused features are then fed into the multilayer perceptron to predict PPI. The results of five-fold cross-validation on the used benchmarked datasets demonstrate that KSGPPI achieves an average prediction accuracy of 88.96 %. Additionally, the average Matthews correlation coefficient value (0.781) of KSGPPI is significantly higher than that of those state-of-the-art PPI prediction methods. The standalone package of KSGPPI is freely downloaded at <span>https://github.com/rickleezhe/KSGPPI</span><svg><path></path></svg>.</p></div>","PeriodicalId":7830,"journal":{"name":"Analytical biochemistry","volume":null,"pages":null},"PeriodicalIF":2.6000,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytical biochemistry","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0003269724000940","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Interactions between proteins are ubiquitous in a wide variety of biological processes. Accurately identifying the protein-protein interaction (PPI) is of significant importance for understanding the mechanisms of protein functions and facilitating drug discovery. Although the wet-lab technological methods are the best way to identify PPI, their major constraints are their time-consuming nature, high cost, and labor-intensiveness. Hence, lots of efforts have been made towards developing computational methods to improve the performance of PPI prediction. In this study, we propose a novel hybrid computational method (called KSGPPI) that aims at improving the prediction performance of PPI via extracting the discriminative information from protein sequences and interaction networks. The KSGPPI model comprises two feature extraction modules. In the first feature extraction module, a large protein language model, ESM-2, is employed to exploit the global complex patterns concealed within protein sequences. Subsequently, feature representations are further extracted through CKSAAP, and a two-dimensional convolutional neural network (CNN) is utilized to capture local information. In the second feature extraction module, the query protein acquires its similar protein from the STRING database via the sequence alignment tool NW-align and then captures the graph embedding feature for the query protein in the protein interaction network of the similar protein using the algorithm of Node2vec. Finally, the features of these two feature extraction modules are efficiently fused; the fused features are then fed into the multilayer perceptron to predict PPI. The results of five-fold cross-validation on the used benchmarked datasets demonstrate that KSGPPI achieves an average prediction accuracy of 88.96 %. Additionally, the average Matthews correlation coefficient value (0.781) of KSGPPI is significantly higher than that of those state-of-the-art PPI prediction methods. The standalone package of KSGPPI is freely downloaded at https://github.com/rickleezhe/KSGPPI.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用蛋白质语言模型和蛋白质网络特征改进蛋白质-蛋白质相互作用预测
蛋白质之间的相互作用在各种生物过程中无处不在。准确鉴定蛋白质-蛋白质相互作用(PPI)对于了解蛋白质功能机制和促进药物发现具有重要意义。虽然湿实验室技术方法是鉴定 PPI 的最佳途径,但其主要限制因素是耗时长、成本高和劳动强度大。因此,人们一直在努力开发计算方法,以提高 PPI 预测的性能。在本研究中,我们提出了一种新型混合计算方法(称为 KSGPPI),旨在通过提取蛋白质序列和相互作用网络中的判别信息来提高 PPI 的预测性能。KSGPPI 模型包括两个特征提取模块。在第一个特征提取模块中,采用了大型蛋白质语言模型ESM-2,以利用隐藏在蛋白质序列中的全局复杂模式。随后,通过 CKSAAP 进一步提取特征表征,并利用二维卷积神经网络(CNN)捕捉局部信息。在第二个特征提取模块中,查询蛋白质通过序列比对工具 NW-align 从 STRING 数据库中获取其相似蛋白质,然后利用 Node2vec 算法在相似蛋白质的蛋白质相互作用网络中捕捉查询蛋白质的图嵌入特征。最后,将这两个特征提取模块的特征进行有效融合;然后将融合后的特征输入全连接神经网络,以预测 PPI。在所使用的基准数据集上进行的五倍交叉验证结果表明,KSGPPI 的平均预测准确率达到了 88.96%。此外,KSGPPI 的平均马修斯相关系数值(0.781)明显高于最先进的 PPI 预测方法。KSGPPI 的独立软件包可从 https://github.com/rickleezhe/KSGPPI 免费下载。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Analytical biochemistry
Analytical biochemistry 生物-分析化学
CiteScore
5.70
自引率
0.00%
发文量
283
审稿时长
44 days
期刊介绍: The journal''s title Analytical Biochemistry: Methods in the Biological Sciences declares its broad scope: methods for the basic biological sciences that include biochemistry, molecular genetics, cell biology, proteomics, immunology, bioinformatics and wherever the frontiers of research take the field. The emphasis is on methods from the strictly analytical to the more preparative that would include novel approaches to protein purification as well as improvements in cell and organ culture. The actual techniques are equally inclusive ranging from aptamers to zymology. The journal has been particularly active in: -Analytical techniques for biological molecules- Aptamer selection and utilization- Biosensors- Chromatography- Cloning, sequencing and mutagenesis- Electrochemical methods- Electrophoresis- Enzyme characterization methods- Immunological approaches- Mass spectrometry of proteins and nucleic acids- Metabolomics- Nano level techniques- Optical spectroscopy in all its forms. The journal is reluctant to include most drug and strictly clinical studies as there are more suitable publication platforms for these types of papers.
期刊最新文献
A new approach for extracellular RNA recovery from Rhodovulum sulfidophilum A novel ratiometric colorimetric sensor for detecting hypochlorite and ascorbic acid based on cascade reaction An electrochemical immunosensor for sensitive and rapid detection of cystatin C based on Fe3O4/AuNPs-MWCNTs@PDA nanocomposite Theoretical screening and electrochemical sensor for determination of norepinephrine using a molecularly imprinted poly (3-amiophenylboronic acid) Electrochemical sensor for the analysis of 5-hydroxymethylcytosine in the presence of cytosine using pencil graphite electrode
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1