trans - b -site:一种预测蛋白质相互作用结合位点的改进方法

IF 6.1 2区 工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY Measurement Pub Date : 2025-06-30 Epub Date: 2025-03-15 DOI:10.1016/j.measurement.2025.117227
Sharzil Haris Khan , Hilal Tayara , Kil To Chong
{"title":"trans - b -site:一种预测蛋白质相互作用结合位点的改进方法","authors":"Sharzil Haris Khan ,&nbsp;Hilal Tayara ,&nbsp;Kil To Chong","doi":"10.1016/j.measurement.2025.117227","DOIUrl":null,"url":null,"abstract":"<div><div>Protein-protein interactions (PPIs) govern essential biological processes, relying on specific binding sites for molecular machinery in cells. Identifying these binding sites is crucial, with computational methods emerging as efficient alternatives to labor-intensive experimental approaches. While various techniques leverage sequential and structural information of amino acids, the limited availability of protein structural data in databases makes sequential-based models more practical. The proposed model, named TranP-B-site, employs a convolutional neural network on the transformer model’s embeddings of the sequential information of the amino acids to predict the binding sites of PPIs. First, two types of features are extracted for each amino acid in a protein sequence: one-hot encoding representing the low-level features and transformer model-based embeddings, which contain information about the entire protein sequence. These one-hot encodings and amino acid embeddings are concatenated to form two matrices. Then, two local feature sets are created by employing a windowing technique across the acquired matrices. The amino acid–based local feature set is fed into a CNN architecture, while the one-hot encoding-based local features are fed into a neural network. Finally, classification is performed on the concatenated output of the CNN and neural network using a sub-neural network. The proposed model demonstrates an improvement of 3% in MCC and 7% in accuracy compared to the previous state-of-the-art sequence-based model for independent dataset. Additionally, a new test dataset was curated from recently published protein sequences in the PDB database, and the proposed model outperformed other state-of-the-art models.</div></div>","PeriodicalId":18349,"journal":{"name":"Measurement","volume":"251 ","pages":"Article 117227"},"PeriodicalIF":6.1000,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"TranP-B-site: A Transformer Enhanced Method for prediction of binding sites of Protein-protein interactions\",\"authors\":\"Sharzil Haris Khan ,&nbsp;Hilal Tayara ,&nbsp;Kil To Chong\",\"doi\":\"10.1016/j.measurement.2025.117227\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Protein-protein interactions (PPIs) govern essential biological processes, relying on specific binding sites for molecular machinery in cells. Identifying these binding sites is crucial, with computational methods emerging as efficient alternatives to labor-intensive experimental approaches. While various techniques leverage sequential and structural information of amino acids, the limited availability of protein structural data in databases makes sequential-based models more practical. The proposed model, named TranP-B-site, employs a convolutional neural network on the transformer model’s embeddings of the sequential information of the amino acids to predict the binding sites of PPIs. First, two types of features are extracted for each amino acid in a protein sequence: one-hot encoding representing the low-level features and transformer model-based embeddings, which contain information about the entire protein sequence. These one-hot encodings and amino acid embeddings are concatenated to form two matrices. Then, two local feature sets are created by employing a windowing technique across the acquired matrices. The amino acid–based local feature set is fed into a CNN architecture, while the one-hot encoding-based local features are fed into a neural network. Finally, classification is performed on the concatenated output of the CNN and neural network using a sub-neural network. The proposed model demonstrates an improvement of 3% in MCC and 7% in accuracy compared to the previous state-of-the-art sequence-based model for independent dataset. Additionally, a new test dataset was curated from recently published protein sequences in the PDB database, and the proposed model outperformed other state-of-the-art models.</div></div>\",\"PeriodicalId\":18349,\"journal\":{\"name\":\"Measurement\",\"volume\":\"251 \",\"pages\":\"Article 117227\"},\"PeriodicalIF\":6.1000,\"publicationDate\":\"2025-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Measurement\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S026322412500586X\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/3/15 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Measurement","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S026322412500586X","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/15 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

蛋白质-蛋白质相互作用(PPIs)控制着基本的生物过程,依赖于细胞中分子机制的特定结合位点。识别这些结合位点是至关重要的,计算方法正在成为劳动密集型实验方法的有效替代方案。虽然各种技术利用氨基酸的序列和结构信息,但数据库中蛋白质结构数据的有限可用性使得基于序列的模型更加实用。该模型被命名为trans - b -site,该模型在氨基酸序列信息的变压器模型嵌入上使用卷积神经网络来预测ppi的结合位点。首先,对蛋白质序列中的每个氨基酸提取两种类型的特征:一种是表示低级特征的one-hot编码,另一种是包含整个蛋白质序列信息的基于转换模型的嵌入。这些单热编码和氨基酸嵌入连接形成两个矩阵。然后,通过在获取的矩阵上使用窗口技术创建两个局部特征集。基于氨基酸的局部特征集被输入到CNN架构中,而基于一热编码的局部特征被输入到神经网络中。最后,使用子神经网络对CNN和神经网络的拼接输出进行分类。与之前最先进的基于序列的独立数据集模型相比,所提出的模型在MCC方面提高了3%,在精度方面提高了7%。此外,从PDB数据库中最近发表的蛋白质序列中整理了一个新的测试数据集,所提出的模型优于其他最先进的模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
TranP-B-site: A Transformer Enhanced Method for prediction of binding sites of Protein-protein interactions
Protein-protein interactions (PPIs) govern essential biological processes, relying on specific binding sites for molecular machinery in cells. Identifying these binding sites is crucial, with computational methods emerging as efficient alternatives to labor-intensive experimental approaches. While various techniques leverage sequential and structural information of amino acids, the limited availability of protein structural data in databases makes sequential-based models more practical. The proposed model, named TranP-B-site, employs a convolutional neural network on the transformer model’s embeddings of the sequential information of the amino acids to predict the binding sites of PPIs. First, two types of features are extracted for each amino acid in a protein sequence: one-hot encoding representing the low-level features and transformer model-based embeddings, which contain information about the entire protein sequence. These one-hot encodings and amino acid embeddings are concatenated to form two matrices. Then, two local feature sets are created by employing a windowing technique across the acquired matrices. The amino acid–based local feature set is fed into a CNN architecture, while the one-hot encoding-based local features are fed into a neural network. Finally, classification is performed on the concatenated output of the CNN and neural network using a sub-neural network. The proposed model demonstrates an improvement of 3% in MCC and 7% in accuracy compared to the previous state-of-the-art sequence-based model for independent dataset. Additionally, a new test dataset was curated from recently published protein sequences in the PDB database, and the proposed model outperformed other state-of-the-art models.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Measurement
Measurement 工程技术-工程:综合
CiteScore
10.20
自引率
12.50%
发文量
1589
审稿时长
12.1 months
期刊介绍: Contributions are invited on novel achievements in all fields of measurement and instrumentation science and technology. Authors are encouraged to submit novel material, whose ultimate goal is an advancement in the state of the art of: measurement and metrology fundamentals, sensors, measurement instruments, measurement and estimation techniques, measurement data processing and fusion algorithms, evaluation procedures and methodologies for plants and industrial processes, performance analysis of systems, processes and algorithms, mathematical models for measurement-oriented purposes, distributed measurement systems in a connected world.
期刊最新文献
Analysis of intrinsic laser acoustic emission signals and noise decoupling in metal sheets under pulsed current excitation Design and modeling of a power-feedback-free analog constant-power drive for robust and precise blackbody radiation measurements A frequency-aware method for visual measurement in unstructured environments: Uncertainty suppression via spectral denoising An eddy current-fiber optic composite sensor for simultaneous blade tip clearance and vibration measurement in turbomachinery Terahertz real-time imaging for non-destructive detection of concealed small insects
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1