NetGO 3.0: Protein Language Model Improves Large-scale Functional Annotations

IF 11.5 2区 生物学 Q1 GENETICS & HEREDITY Genomics, Proteomics & Bioinformatics Pub Date : 2023-04-01 DOI:10.1016/j.gpb.2023.04.001
Shaojun Wang , Ronghui You , Yunjia Liu , Yi Xiong , Shanfeng Zhu
{"title":"NetGO 3.0: Protein Language Model Improves Large-scale Functional Annotations","authors":"Shaojun Wang ,&nbsp;Ronghui You ,&nbsp;Yunjia Liu ,&nbsp;Yi Xiong ,&nbsp;Shanfeng Zhu","doi":"10.1016/j.gpb.2023.04.001","DOIUrl":null,"url":null,"abstract":"<div><p>As one of the state-of-the-art automated function prediction (AFP) methods, NetGO 2.0 integrates multi-source information to improve the performance. However, it mainly utilizes the proteins with experimentally supported functional annotations without leveraging valuable information from a vast number of unannotated proteins. Recently, <strong>protein language models</strong> have been proposed to learn informative representations [<em>e.g.</em>, Evolutionary Scale Modeling (ESM)-1b embedding] from protein sequences based on self-supervision. Here, we represented each protein by ESM-1b and used logistic regression (LR) to train a new model, LR-ESM, for AFP. The experimental results showed that LR-ESM achieved comparable performance with the best-performing component of NetGO 2.0. Therefore, by incorporating LR-ESM into NetGO 2.0, we developed NetGO 3.0 to improve the performance of AFP extensively. NetGO 3.0 is freely accessible at <span>https://dmiip.sjtu.edu.cn/ng3.0</span><svg><path></path></svg>.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"21 2","pages":"Pages 349-358"},"PeriodicalIF":11.5000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genomics, Proteomics & Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1672022923000669","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 6

Abstract

As one of the state-of-the-art automated function prediction (AFP) methods, NetGO 2.0 integrates multi-source information to improve the performance. However, it mainly utilizes the proteins with experimentally supported functional annotations without leveraging valuable information from a vast number of unannotated proteins. Recently, protein language models have been proposed to learn informative representations [e.g., Evolutionary Scale Modeling (ESM)-1b embedding] from protein sequences based on self-supervision. Here, we represented each protein by ESM-1b and used logistic regression (LR) to train a new model, LR-ESM, for AFP. The experimental results showed that LR-ESM achieved comparable performance with the best-performing component of NetGO 2.0. Therefore, by incorporating LR-ESM into NetGO 2.0, we developed NetGO 3.0 to improve the performance of AFP extensively. NetGO 3.0 is freely accessible at https://dmiip.sjtu.edu.cn/ng3.0.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
NetGO 3.0:蛋白质语言模型改进了大规模功能注释。
作为最先进的自动函数预测(AFP)方法之一,NetGO 2.0集成了多源信息以提高性能。然而,它主要利用具有实验支持的功能注释的蛋白质,而没有利用来自大量未注释蛋白质的有价值信息。最近,蛋白质语言模型被提出来从基于自我监督的蛋白质序列中学习信息表示[例如,进化尺度建模(ESM)-1b嵌入]。在这里,我们用ESM-1b表示每种蛋白质,并使用逻辑回归(LR)来训练AFP的新模型LR-ESM。实验结果表明,LR-ESM的性能与性能最好的NetGO 2.0组件相当。因此,通过将LR-ESM纳入NetGO 2.0,我们开发了NetGO 3.0,以广泛提高AFP的性能。NetGO 3.0可在https://dmiip.sjtu.edu.cn/ng3.0.
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Genomics, Proteomics & Bioinformatics
Genomics, Proteomics & Bioinformatics Biochemistry, Genetics and Molecular Biology-Biochemistry
CiteScore
14.30
自引率
4.20%
发文量
844
审稿时长
61 days
期刊介绍: Genomics, Proteomics and Bioinformatics (GPB) is the official journal of the Beijing Institute of Genomics, Chinese Academy of Sciences / China National Center for Bioinformation and Genetics Society of China. It aims to disseminate new developments in the field of omics and bioinformatics, publish high-quality discoveries quickly, and promote open access and online publication. GPB welcomes submissions in all areas of life science, biology, and biomedicine, with a focus on large data acquisition, analysis, and curation. Manuscripts covering omics and related bioinformatics topics are particularly encouraged. GPB is indexed/abstracted by PubMed/MEDLINE, PubMed Central, Scopus, BIOSIS Previews, Chemical Abstracts, CSCD, among others.
期刊最新文献
Review and Evaluate the Bioinformatics Analysis Strategies of ATAC-seq and CUT&Tag Data. Identification of highly repetitive barley enhancers with long-range regulation potential via STARR-seq CpG island definition and methylation mapping of the T2T-YAO genome Pindel-TD: a tandem duplication detector based on a pattern growth approach SMARTdb: An Integrated Database for Exploring Single-cell Multi-omics Data of Reproductive Medicine
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1