利用机器学习方法,基于功能和结构特征预测蛋白质在大肠杆菌中的溶解度

Feiming Huang, Qian Gao, XianChao Zhou, Wei Guo, KaiYan Feng, Lin Zhu, Tao Huang, Yu-Dong Cai
{"title":"利用机器学习方法,基于功能和结构特征预测蛋白质在大肠杆菌中的溶解度","authors":"Feiming Huang, Qian Gao, XianChao Zhou, Wei Guo, KaiYan Feng, Lin Zhu, Tao Huang, Yu-Dong Cai","doi":"10.1007/s10930-024-10230-z","DOIUrl":null,"url":null,"abstract":"<p><p>Protein solubility is a critical parameter that determines the stability, activity, and functionality of proteins, with broad and far-reaching implications in biotechnology and biochemistry. Accurate prediction and control of protein solubility are essential for successful protein expression and purification in research and industrial settings. This study gathered information on soluble and insoluble proteins. In characterizing the proteins, they were mapped to STRING and characterized by functional and structural features. All functional/structural features were integrated to create a 5768-dimensional binary vector to encode proteins. Seven feature-ranking algorithms were employed to analyze the functional/structural features, yielding seven feature lists. These lists were subjected to the incremental feature selection, incorporating four classification algorithms, one by one to build effective classification models and identify functional/structural features with classification-related importance. Some essential functional/structural features used to differentiate between soluble and insoluble proteins were identified, including GO:0009987 (intercellular communication) and GO:0022613 (ribonucleoprotein complex biogenesis). The best classification model using support vector machine as the classification algorithm and 295 optimized functional/structural features generated the F1 score of 0.825, which can be a powerful tool to differentiate soluble proteins from insoluble proteins.</p>","PeriodicalId":94249,"journal":{"name":"The protein journal","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Prediction of Solubility of Proteins in Escherichia coli Based on Functional and Structural Features Using Machine Learning Methods.\",\"authors\":\"Feiming Huang, Qian Gao, XianChao Zhou, Wei Guo, KaiYan Feng, Lin Zhu, Tao Huang, Yu-Dong Cai\",\"doi\":\"10.1007/s10930-024-10230-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Protein solubility is a critical parameter that determines the stability, activity, and functionality of proteins, with broad and far-reaching implications in biotechnology and biochemistry. Accurate prediction and control of protein solubility are essential for successful protein expression and purification in research and industrial settings. This study gathered information on soluble and insoluble proteins. In characterizing the proteins, they were mapped to STRING and characterized by functional and structural features. All functional/structural features were integrated to create a 5768-dimensional binary vector to encode proteins. Seven feature-ranking algorithms were employed to analyze the functional/structural features, yielding seven feature lists. These lists were subjected to the incremental feature selection, incorporating four classification algorithms, one by one to build effective classification models and identify functional/structural features with classification-related importance. Some essential functional/structural features used to differentiate between soluble and insoluble proteins were identified, including GO:0009987 (intercellular communication) and GO:0022613 (ribonucleoprotein complex biogenesis). The best classification model using support vector machine as the classification algorithm and 295 optimized functional/structural features generated the F1 score of 0.825, which can be a powerful tool to differentiate soluble proteins from insoluble proteins.</p>\",\"PeriodicalId\":94249,\"journal\":{\"name\":\"The protein journal\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The protein journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s10930-024-10230-z\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The protein journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10930-024-10230-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

蛋白质溶解度是决定蛋白质稳定性、活性和功能的关键参数,对生物技术和生物化学具有广泛而深远的影响。准确预测和控制蛋白质的溶解度对于在研究和工业环境中成功表达和纯化蛋白质至关重要。本研究收集了有关可溶性和不可溶性蛋白质的信息。在表征蛋白质时,它们被映射到 STRING 中,并根据功能和结构特征进行表征。整合所有功能/结构特征后,创建了一个 5768 维的二进制向量来编码蛋白质。在分析功能/结构特征时,采用了七种特征排序算法,得出了七个特征列表。这些列表经过增量特征选择,结合四种分类算法,逐一建立有效的分类模型,并识别出与分类相关的重要功能/结构特征。结果发现了一些用于区分可溶性和非可溶性蛋白质的基本功能/结构特征,包括 GO:0009987(细胞间通讯)和 GO:0022613(核糖核蛋白复合物生物生成)。使用支持向量机作为分类算法和 295 个优化的功能/结构特征的最佳分类模型产生了 0.825 的 F1 分数,这可以作为区分可溶性蛋白质和不溶性蛋白质的有力工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Prediction of Solubility of Proteins in Escherichia coli Based on Functional and Structural Features Using Machine Learning Methods.

Protein solubility is a critical parameter that determines the stability, activity, and functionality of proteins, with broad and far-reaching implications in biotechnology and biochemistry. Accurate prediction and control of protein solubility are essential for successful protein expression and purification in research and industrial settings. This study gathered information on soluble and insoluble proteins. In characterizing the proteins, they were mapped to STRING and characterized by functional and structural features. All functional/structural features were integrated to create a 5768-dimensional binary vector to encode proteins. Seven feature-ranking algorithms were employed to analyze the functional/structural features, yielding seven feature lists. These lists were subjected to the incremental feature selection, incorporating four classification algorithms, one by one to build effective classification models and identify functional/structural features with classification-related importance. Some essential functional/structural features used to differentiate between soluble and insoluble proteins were identified, including GO:0009987 (intercellular communication) and GO:0022613 (ribonucleoprotein complex biogenesis). The best classification model using support vector machine as the classification algorithm and 295 optimized functional/structural features generated the F1 score of 0.825, which can be a powerful tool to differentiate soluble proteins from insoluble proteins.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Characterization of Cationic Amino Acid Binding Protein from Candidatus Liberibacter Asiaticus and in Silico Study to Identify Potential Inhibitor Molecules. Sulfonylhydrazide Derivatives as Potential Anti-cancer Agents: Synthesis, In Vitro and In Silico Studies. Prediction of Solubility of Proteins in Escherichia coli Based on Functional and Structural Features Using Machine Learning Methods. Exploring Acyl Thiotriazinoindole Based Pharmacophores: Design, Synthesis, and SAR Studies with Molecular Docking and Biological Activity Profiling against Urease, α-amylase, α-glucosidase, Antimicrobial, and Antioxidant Targets. Dual Antimicrobial and Anticancer Activity of Membrane-Active Peptide BP52.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1