基于多相似性融合和负样本选择的卷积神经网络识别蛋白质磷酸化位点与疾病的联系

IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Interdisciplinary Sciences: Computational Life Sciences Pub Date : 2024-09-01 Epub Date: 2024-03-08 DOI:10.1007/s12539-024-00615-0
Qian Deng, Jing Zhang, Jie Liu, Yuqi Liu, Zong Dai, Xiaoyong Zou, Zhanchao Li
{"title":"基于多相似性融合和负样本选择的卷积神经网络识别蛋白质磷酸化位点与疾病的联系","authors":"Qian Deng, Jing Zhang, Jie Liu, Yuqi Liu, Zong Dai, Xiaoyong Zou, Zhanchao Li","doi":"10.1007/s12539-024-00615-0","DOIUrl":null,"url":null,"abstract":"<p><p>As one of the most important post-translational modifications (PTMs), protein phosphorylation plays a key role in a variety of biological processes. Many studies have shown that protein phosphorylation is associated with various human diseases. Therefore, identifying protein phosphorylation site-disease associations can help to elucidate the pathogenesis of disease and discover new drug targets. Networks of sequence similarity and Gaussian interaction profile kernel similarity were constructed for phosphorylation sites, as well as networks of disease semantic similarity, disease symptom similarity and Gaussian interaction profile kernel similarity were constructed for diseases. To effectively combine different phosphorylation sites and disease similarity information, random walk with restart algorithm was used to obtain the topology information of the network. Then, the diffusion component analysis method was utilized to obtain the comprehensive phosphorylation site similarity and disease similarity. Meanwhile, the reliable negative samples were screened based on the Euclidean distance method. Finally, a convolutional neural network (CNN) model was constructed to identify potential associations between phosphorylation sites and diseases. Based on tenfold cross-validation, the evaluation indicators were obtained including accuracy of 93.48%, specificity of 96.82%, sensitivity of 90.15%, precision of 96.62%, Matthew's correlation coefficient of 0.8719, area under the receiver operating characteristic curve of 0.9786 and area under the precision-recall curve of 0.9836. Additionally, most of the top 20 predicted disease-related phosphorylation sites (19/20 for Alzheimer's disease; 20/16 for neuroblastoma) were verified by literatures and databases. These results show that the proposed method has an outstanding prediction performance and a high practical value.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"649-664"},"PeriodicalIF":3.9000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Identifying Protein Phosphorylation Site-Disease Associations Based on Multi-Similarity Fusion and Negative Sample Selection by Convolutional Neural Network.\",\"authors\":\"Qian Deng, Jing Zhang, Jie Liu, Yuqi Liu, Zong Dai, Xiaoyong Zou, Zhanchao Li\",\"doi\":\"10.1007/s12539-024-00615-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>As one of the most important post-translational modifications (PTMs), protein phosphorylation plays a key role in a variety of biological processes. Many studies have shown that protein phosphorylation is associated with various human diseases. Therefore, identifying protein phosphorylation site-disease associations can help to elucidate the pathogenesis of disease and discover new drug targets. Networks of sequence similarity and Gaussian interaction profile kernel similarity were constructed for phosphorylation sites, as well as networks of disease semantic similarity, disease symptom similarity and Gaussian interaction profile kernel similarity were constructed for diseases. To effectively combine different phosphorylation sites and disease similarity information, random walk with restart algorithm was used to obtain the topology information of the network. Then, the diffusion component analysis method was utilized to obtain the comprehensive phosphorylation site similarity and disease similarity. Meanwhile, the reliable negative samples were screened based on the Euclidean distance method. Finally, a convolutional neural network (CNN) model was constructed to identify potential associations between phosphorylation sites and diseases. Based on tenfold cross-validation, the evaluation indicators were obtained including accuracy of 93.48%, specificity of 96.82%, sensitivity of 90.15%, precision of 96.62%, Matthew's correlation coefficient of 0.8719, area under the receiver operating characteristic curve of 0.9786 and area under the precision-recall curve of 0.9836. Additionally, most of the top 20 predicted disease-related phosphorylation sites (19/20 for Alzheimer's disease; 20/16 for neuroblastoma) were verified by literatures and databases. These results show that the proposed method has an outstanding prediction performance and a high practical value.</p>\",\"PeriodicalId\":13670,\"journal\":{\"name\":\"Interdisciplinary Sciences: Computational Life Sciences\",\"volume\":\" \",\"pages\":\"649-664\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2024-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Interdisciplinary Sciences: Computational Life Sciences\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1007/s12539-024-00615-0\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/3/8 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interdisciplinary Sciences: Computational Life Sciences","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s12539-024-00615-0","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/3/8 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

作为最重要的翻译后修饰(PTMs)之一,蛋白质磷酸化在多种生物过程中发挥着关键作用。许多研究表明,蛋白质磷酸化与人类的各种疾病有关。因此,确定蛋白质磷酸化位点与疾病的关联有助于阐明疾病的发病机制和发现新的药物靶点。我们为磷酸化位点构建了序列相似性网络和高斯相互作用图谱核相似性网络,并为疾病构建了疾病语义相似性网络、疾病症状相似性网络和高斯相互作用图谱核相似性网络。为了有效结合不同的磷酸化位点和疾病相似性信息,采用了带重启的随机游走算法来获取网络的拓扑信息。然后,利用扩散成分分析方法获得磷酸化位点相似性和疾病相似性的综合信息。同时,根据欧氏距离法筛选出可靠的阴性样本。最后,构建了一个卷积神经网络(CNN)模型来识别磷酸化位点与疾病之间的潜在关联。在十倍交叉验证的基础上,得到的评价指标包括:准确率为93.48%,特异性为96.82%,灵敏度为90.15%,精确度为96.62%,马修相关系数为0.8719,接收者工作特征曲线下面积为0.9786,精确度-召回曲线下面积为0.9836。此外,预测的前 20 个与疾病相关的磷酸化位点(阿尔茨海默病为 19/20;神经母细胞瘤为 20/16)中的大部分都得到了文献和数据库的验证。这些结果表明,所提出的方法具有出色的预测性能和较高的实用价值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Identifying Protein Phosphorylation Site-Disease Associations Based on Multi-Similarity Fusion and Negative Sample Selection by Convolutional Neural Network.

As one of the most important post-translational modifications (PTMs), protein phosphorylation plays a key role in a variety of biological processes. Many studies have shown that protein phosphorylation is associated with various human diseases. Therefore, identifying protein phosphorylation site-disease associations can help to elucidate the pathogenesis of disease and discover new drug targets. Networks of sequence similarity and Gaussian interaction profile kernel similarity were constructed for phosphorylation sites, as well as networks of disease semantic similarity, disease symptom similarity and Gaussian interaction profile kernel similarity were constructed for diseases. To effectively combine different phosphorylation sites and disease similarity information, random walk with restart algorithm was used to obtain the topology information of the network. Then, the diffusion component analysis method was utilized to obtain the comprehensive phosphorylation site similarity and disease similarity. Meanwhile, the reliable negative samples were screened based on the Euclidean distance method. Finally, a convolutional neural network (CNN) model was constructed to identify potential associations between phosphorylation sites and diseases. Based on tenfold cross-validation, the evaluation indicators were obtained including accuracy of 93.48%, specificity of 96.82%, sensitivity of 90.15%, precision of 96.62%, Matthew's correlation coefficient of 0.8719, area under the receiver operating characteristic curve of 0.9786 and area under the precision-recall curve of 0.9836. Additionally, most of the top 20 predicted disease-related phosphorylation sites (19/20 for Alzheimer's disease; 20/16 for neuroblastoma) were verified by literatures and databases. These results show that the proposed method has an outstanding prediction performance and a high practical value.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Interdisciplinary Sciences: Computational Life Sciences
Interdisciplinary Sciences: Computational Life Sciences MATHEMATICAL & COMPUTATIONAL BIOLOGY-
CiteScore
8.60
自引率
4.20%
发文量
55
期刊介绍: Interdisciplinary Sciences--Computational Life Sciences aims to cover the most recent and outstanding developments in interdisciplinary areas of sciences, especially focusing on computational life sciences, an area that is enjoying rapid development at the forefront of scientific research and technology. The journal publishes original papers of significant general interest covering recent research and developments. Articles will be published rapidly by taking full advantage of internet technology for online submission and peer-reviewing of manuscripts, and then by publishing OnlineFirstTM through SpringerLink even before the issue is built or sent to the printer. The editorial board consists of many leading scientists with international reputation, among others, Luc Montagnier (UNESCO, France), Dennis Salahub (University of Calgary, Canada), Weitao Yang (Duke University, USA). Prof. Dongqing Wei at the Shanghai Jiatong University is appointed as the editor-in-chief; he made important contributions in bioinformatics and computational physics and is best known for his ground-breaking works on the theory of ferroelectric liquids. With the help from a team of associate editors and the editorial board, an international journal with sound reputation shall be created.
期刊最新文献
Adap-BDCM: Adaptive Bilinear Dynamic Cascade Model for Classification Tasks on CNV Datasets. CVGAE: A Self-Supervised Generative Method for Gene Regulatory Network Inference Using Single-Cell RNA Sequencing Data. Unraveling Brain Synchronisation Dynamics by Explainable Neural Networks using EEG Signals: Application to Dyslexia Diagnosis. Ensemble Machine Learning and Predicted Properties Promote Antimicrobial Peptide Identification. Viral Rebound After Antiviral Treatment: A Mathematical Modeling Study of the Role of Antiviral Mechanism of Action.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1