VotePLMs-AFP:利用变压器嵌入特征和集合学习识别抗冻蛋白

IF 2.8 3区 生物学 Q3 BIOCHEMISTRY & MOLECULAR BIOLOGY Biochimica et biophysica acta. General subjects Pub Date : 2024-10-18 DOI:10.1016/j.bbagen.2024.130721
Dawei Qi, Taigang Liu
{"title":"VotePLMs-AFP:利用变压器嵌入特征和集合学习识别抗冻蛋白","authors":"Dawei Qi,&nbsp;Taigang Liu","doi":"10.1016/j.bbagen.2024.130721","DOIUrl":null,"url":null,"abstract":"<div><div>Antifreeze proteins (AFPs) are a unique class of biomolecules capable of protecting other proteins, cell membranes, and cellular structures within organisms from damage caused by freezing conditions. Given the significance of AFPs in various domains such as biotechnology, agriculture, and medicine, several machine learning methods have been developed to identify AFPs. However, due to the complexity and diversity of AFPs, the predictive performance of existing methods is limited. Therefore, there is an urgent need to develop an efficient and rapid computational method for accurately predicting AFPs. In this study, we proposed a novel predictor based on transformer-embedding features and ensemble learning for the identification of AFPs, termed VotePLMs-AFP. Firstly, three types of feature descriptors were extracted from pre-trained protein language models (PLMs) during the feature extraction process. Subsequently, we analyzed six combinations generated by these three embeddings to explore the optimal feature set, which was input into the soft voting-based ensemble learning classifier for the identification of AFPs. Finally, we evaluated the model on the two benchmark datasets. The experimental results show that our model achieves high prediction accuracy in 10-fold cross-validation (CV) and independent set testing, outperforming existing state-of-the-art methods. Therefore, our model could serve as an effective tool for predicting AFPs.</div></div>","PeriodicalId":8800,"journal":{"name":"Biochimica et biophysica acta. General subjects","volume":null,"pages":null},"PeriodicalIF":2.8000,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"VotePLMs-AFP: Identification of antifreeze proteins using transformer-embedding features and ensemble learning\",\"authors\":\"Dawei Qi,&nbsp;Taigang Liu\",\"doi\":\"10.1016/j.bbagen.2024.130721\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Antifreeze proteins (AFPs) are a unique class of biomolecules capable of protecting other proteins, cell membranes, and cellular structures within organisms from damage caused by freezing conditions. Given the significance of AFPs in various domains such as biotechnology, agriculture, and medicine, several machine learning methods have been developed to identify AFPs. However, due to the complexity and diversity of AFPs, the predictive performance of existing methods is limited. Therefore, there is an urgent need to develop an efficient and rapid computational method for accurately predicting AFPs. In this study, we proposed a novel predictor based on transformer-embedding features and ensemble learning for the identification of AFPs, termed VotePLMs-AFP. Firstly, three types of feature descriptors were extracted from pre-trained protein language models (PLMs) during the feature extraction process. Subsequently, we analyzed six combinations generated by these three embeddings to explore the optimal feature set, which was input into the soft voting-based ensemble learning classifier for the identification of AFPs. Finally, we evaluated the model on the two benchmark datasets. The experimental results show that our model achieves high prediction accuracy in 10-fold cross-validation (CV) and independent set testing, outperforming existing state-of-the-art methods. Therefore, our model could serve as an effective tool for predicting AFPs.</div></div>\",\"PeriodicalId\":8800,\"journal\":{\"name\":\"Biochimica et biophysica acta. General subjects\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biochimica et biophysica acta. General subjects\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0304416524001648\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biochimica et biophysica acta. General subjects","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0304416524001648","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

抗冻蛋白(AFPs)是一类独特的生物大分子,能够保护生物体内的其他蛋白质、细胞膜和细胞结构免受冷冻条件的破坏。鉴于抗冻蛋白在生物技术、农业和医学等各个领域的重要性,人们开发了多种机器学习方法来识别抗冻蛋白。然而,由于 AFP 的复杂性和多样性,现有方法的预测性能有限。因此,迫切需要开发一种高效、快速的计算方法来准确预测 AFPs。在这项研究中,我们提出了一种基于变压器嵌入特征和集合学习的新型预测方法,用于识别 AFP,称为 VotePLMs-AFP。首先,在特征提取过程中,我们从预先训练好的蛋白质语言模型(PLMs)中提取了三种类型的特征描述符。随后,我们分析了由这三种嵌入产生的六种组合,以探索最佳特征集,并将其输入基于软投票的集合学习分类器,用于识别 AFP。最后,我们在两个基准数据集上对模型进行了评估。实验结果表明,我们的模型在 10 倍交叉验证(CV)和独立集测试中都达到了很高的预测准确率,优于现有的先进方法。因此,我们的模型可以作为预测 AFP 的有效工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
VotePLMs-AFP: Identification of antifreeze proteins using transformer-embedding features and ensemble learning
Antifreeze proteins (AFPs) are a unique class of biomolecules capable of protecting other proteins, cell membranes, and cellular structures within organisms from damage caused by freezing conditions. Given the significance of AFPs in various domains such as biotechnology, agriculture, and medicine, several machine learning methods have been developed to identify AFPs. However, due to the complexity and diversity of AFPs, the predictive performance of existing methods is limited. Therefore, there is an urgent need to develop an efficient and rapid computational method for accurately predicting AFPs. In this study, we proposed a novel predictor based on transformer-embedding features and ensemble learning for the identification of AFPs, termed VotePLMs-AFP. Firstly, three types of feature descriptors were extracted from pre-trained protein language models (PLMs) during the feature extraction process. Subsequently, we analyzed six combinations generated by these three embeddings to explore the optimal feature set, which was input into the soft voting-based ensemble learning classifier for the identification of AFPs. Finally, we evaluated the model on the two benchmark datasets. The experimental results show that our model achieves high prediction accuracy in 10-fold cross-validation (CV) and independent set testing, outperforming existing state-of-the-art methods. Therefore, our model could serve as an effective tool for predicting AFPs.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Biochimica et biophysica acta. General subjects
Biochimica et biophysica acta. General subjects 生物-生化与分子生物学
CiteScore
6.40
自引率
0.00%
发文量
139
审稿时长
30 days
期刊介绍: BBA General Subjects accepts for submission either original, hypothesis-driven studies or reviews covering subjects in biochemistry and biophysics that are considered to have general interest for a wide audience. Manuscripts with interdisciplinary approaches are especially encouraged.
期刊最新文献
Novel pilot study on plasma metabolites and biomarkers in a rat model of silica-induced lung inflammation and fibrosis A comparative study of bioenergetic metabolism on mammary epithelial cells from humans and Göttingen Minipigs Effect of modification of siRNA molecules delivered with aminopropylsilanol nanoparticles on suppression of A/H5N1 virus in cell culture Evaluating cholinesterases inhibition by BAC and DDAC biocides: A combined experimental and theoretical approach Photosensitizing metal–organic framework nanoparticles combined with tumor-sensitization strategies can enhance the phototherapeutic effect upon medullary thyroid carcinoma
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1