Identifying protein subcellular location with embedding features learned from networks

IF 0.5 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS Current Proteomics Pub Date : 2020-11-24 DOI:10.2174/1570164617999201124142950

Hongwei Liu, Bin Hu, Lei Chen, Lin Lu

{"title":"Identifying protein subcellular location with embedding features learned from networks","authors":"Hongwei Liu, Bin Hu, Lei Chen, Lin Lu","doi":"10.2174/1570164617999201124142950","DOIUrl":null,"url":null,"abstract":"\n\nIdentification of protein subcellular location is an important problem because the subcellular location\nis highly related to protein function. It is fundamental to determine the locations with biology experiments. However,\nthese experiments are of high costs and time-consuming. The alternative way to address such problem is to design effective\ncomputational methods.\n\n\n\nTo date, several computational methods have been proposed in this regard. However, these methods mainly\nadopted the features derived from proteins themselves. On the other hand, with the development of network technique, several\nembedding algorithms have been proposed, which can encode nodes in the network into feature vectors. Such algorithms\nconnected the network and traditional classification algorithms. Thus, they provided a new way to construct models\nfor the prediction of protein subcellular location.\n\n\n\n In this study, we analyzed features produced by three network embedding algorithms (DeepWalk, Node2vec and\nMashup) that were applied on one or multiple protein networks. Obtained features were learned by one machine learning algorithm\n(support vector machine or random forest) to construct the model. The cross-validation method was adopted to\nevaluate all constructed models.\n\n\n\nAfter evaluating models with the cross-validation method, embedding features yielded by Mashup on multiple networks\nwere quite informative for predicting protein subcellular location. The model based on these features were superior to\nsome classic models.\n\n\n\n Embedding features yielded by a proper and powerful network embedding algorithm were effective for building\nthe model for prediction of protein subcellular location, providing new pipelines to build more efficient models.\n","PeriodicalId":50601,"journal":{"name":"Current Proteomics","volume":"27 1","pages":""},"PeriodicalIF":0.5000,"publicationDate":"2020-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"36","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Proteomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/1570164617999201124142950","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 36

Abstract

Identification of protein subcellular location is an important problem because the subcellular location is highly related to protein function. It is fundamental to determine the locations with biology experiments. However, these experiments are of high costs and time-consuming. The alternative way to address such problem is to design effective computational methods. To date, several computational methods have been proposed in this regard. However, these methods mainly adopted the features derived from proteins themselves. On the other hand, with the development of network technique, several embedding algorithms have been proposed, which can encode nodes in the network into feature vectors. Such algorithms connected the network and traditional classification algorithms. Thus, they provided a new way to construct models for the prediction of protein subcellular location. In this study, we analyzed features produced by three network embedding algorithms (DeepWalk, Node2vec and Mashup) that were applied on one or multiple protein networks. Obtained features were learned by one machine learning algorithm (support vector machine or random forest) to construct the model. The cross-validation method was adopted to evaluate all constructed models. After evaluating models with the cross-validation method, embedding features yielded by Mashup on multiple networks were quite informative for predicting protein subcellular location. The model based on these features were superior to some classic models. Embedding features yielded by a proper and powerful network embedding algorithm were effective for building the model for prediction of protein subcellular location, providing new pipelines to build more efficient models.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用网络学习的嵌入特征识别蛋白质亚细胞定位

蛋白质亚细胞位置的确定是一个重要的问题，因为亚细胞位置与蛋白质的功能高度相关。用生物学实验确定实验地点是很重要的。然而，这些实验成本高，耗时长。解决这类问题的另一种方法是设计有效的计算方法。迄今为止，在这方面已经提出了几种计算方法。然而，这些方法主要采用了蛋白质本身的特征。另一方面，随着网络技术的发展，人们提出了几种嵌入算法，将网络中的节点编码为特征向量。这种算法连接了网络和传统的分类算法。因此，他们为构建预测蛋白质亚细胞定位的模型提供了一种新的方法。在本研究中，我们分析了应用于一个或多个蛋白质网络的三种网络嵌入算法(DeepWalk、Node2vec和mashup)产生的特征。通过一种机器学习算法(支持向量机或随机森林)学习得到的特征来构建模型。采用交叉验证法对所构建的模型进行评价。在用交叉验证方法评估模型后，Mashup在多个网络上产生的嵌入特征对于预测蛋白质亚细胞位置具有相当大的信息量。基于这些特征的模型优于一些经典模型。适当且功能强大的网络嵌入算法所产生的嵌入特征对于构建蛋白质亚细胞定位预测模型是有效的，为构建更高效的模型提供了新的途径。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Current Proteomics BIOCHEMICAL RESEARCH METHODS-BIOCHEMISTRY & MOLECULAR BIOLOGY

CiteScore

1.60

自引率

0.00%

发文量

审稿时长

>0 weeks

期刊介绍： Research in the emerging field of proteomics is growing at an extremely rapid rate. The principal aim of Current Proteomics is to publish well-timed in-depth/mini review articles in this fast-expanding area on topics relevant and significant to the development of proteomics. Current Proteomics is an essential journal for everyone involved in proteomics and related fields in both academia and industry. Current Proteomics publishes in-depth/mini review articles in all aspects of the fast-expanding field of proteomics. All areas of proteomics are covered together with the methodology, software, databases, technological advances and applications of proteomics, including functional proteomics. Diverse technologies covered include but are not limited to: Protein separation and characterization techniques 2-D gel electrophoresis and image analysis Techniques for protein expression profiling including mass spectrometry-based methods and algorithms for correlative database searching Determination of co-translational and post- translational modification of proteins Protein/peptide microarrays Biomolecular interaction analysis Analysis of protein complexes Yeast two-hybrid projects Protein-protein interaction (protein interactome) pathways and cell signaling networks Systems biology Proteome informatics (bioinformatics) Knowledge integration and management tools High-throughput protein structural studies (using mass spectrometry, nuclear magnetic resonance and X-ray crystallography) High-throughput computational methods for protein 3-D structure as well as function determination Robotics, nanotechnology, and microfluidics.