HPClas: A data‐driven approach for identifying halophilic proteins based on catBoost

IF 4.5 Q1 MICROBIOLOGY mLife Pub Date : 2024-07-20 DOI:10.1002/mlf2.12125
Shantong Hu, Xiao-Yong Wang, Zhikang Wang, Menghan Jiang, Shihui Wang, Wenya Wang, Jiangning Song, Guimin Zhang
{"title":"HPClas: A data‐driven approach for identifying halophilic proteins based on catBoost","authors":"Shantong Hu, Xiao-Yong Wang, Zhikang Wang, Menghan Jiang, Shihui Wang, Wenya Wang, Jiangning Song, Guimin Zhang","doi":"10.1002/mlf2.12125","DOIUrl":null,"url":null,"abstract":"Halophilic proteins possess unique structural properties and show high stability under extreme conditions. This distinct characteristic makes them invaluable for application in various aspects such as bioenergy, pharmaceuticals, environmental clean‐up, and energy production. Generally, halophilic proteins are discovered and characterized through labor‐intensive and time‐consuming wet lab experiments. In this study, we introduce the Halophilic Protein Classifier (HPClas), a machine learning‐based classifier developed using the catBoost ensemble learning technique to identify halophilic proteins. Extensive in silico calculations were conducted on a large public dataset of 12,574 samples and HPClas achieved an area under the receiver operating characteristic curve (AUROC) of 0.844 on an independent test set of 200 samples. The source code and curated dataset of HPClas are publicly available at https://github.com/Showmake2/HPClas. In conclusion, HPClas can be explored as a promising tool to aid in the identification of halophilic proteins and accelerate their application in different fields.","PeriodicalId":94145,"journal":{"name":"mLife","volume":null,"pages":null},"PeriodicalIF":4.5000,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"mLife","FirstCategoryId":"0","ListUrlMain":"https://doi.org/10.1002/mlf2.12125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Halophilic proteins possess unique structural properties and show high stability under extreme conditions. This distinct characteristic makes them invaluable for application in various aspects such as bioenergy, pharmaceuticals, environmental clean‐up, and energy production. Generally, halophilic proteins are discovered and characterized through labor‐intensive and time‐consuming wet lab experiments. In this study, we introduce the Halophilic Protein Classifier (HPClas), a machine learning‐based classifier developed using the catBoost ensemble learning technique to identify halophilic proteins. Extensive in silico calculations were conducted on a large public dataset of 12,574 samples and HPClas achieved an area under the receiver operating characteristic curve (AUROC) of 0.844 on an independent test set of 200 samples. The source code and curated dataset of HPClas are publicly available at https://github.com/Showmake2/HPClas. In conclusion, HPClas can be explored as a promising tool to aid in the identification of halophilic proteins and accelerate their application in different fields.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
HPClas:基于 catBoost 的数据驱动型嗜卤蛋白质识别方法
嗜卤蛋白质具有独特的结构特性,在极端条件下表现出高度稳定性。这一显著特点使它们在生物能源、制药、环境清洁和能源生产等各方面的应用变得非常宝贵。一般来说,嗜卤蛋白质的发现和表征需要通过耗费大量人力和时间的湿实验室实验来完成。在本研究中,我们介绍了嗜卤蛋白质分类器(HPClas),这是一种基于机器学习的分类器,采用 catBoost 集合学习技术开发,用于识别嗜卤蛋白质。在一个包含12574个样本的大型公共数据集上进行了广泛的硅计算,在一个包含200个样本的独立测试集上,HPClas的接收者操作特征曲线下面积(AUROC)达到了0.844。HPClas 的源代码和数据集可在 https://github.com/Showmake2/HPClas 上公开获取。总之,HPClas 是一种很有前途的工具,可以帮助鉴定嗜卤蛋白质并加速其在不同领域的应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
2.30
自引率
0.00%
发文量
0
期刊最新文献
Staphylococcus aureus SOS response: Activation, impact, and drug targets. EmbB and EmbC regulate the sensitivity of Mycobacterium abscessus to echinomycin. Metabolic activities of marine ammonia-oxidizing archaea orchestrated by quorum sensing. Zinc finger 4 negatively controls the transcriptional activator Fzf1 in Saccharomyces cerevisiae. Efficient, compact, and versatile: Type I-F2 CRISPR-Cas system.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1