BPAGS: a web application for bacteriocin prediction via feature evaluation using alternating decision tree, genetic algorithm, and linear support vector classifier

IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Frontiers in bioinformatics Pub Date : 2024-01-10 DOI:10.3389/fbinf.2023.1284705
Suraiya Akhter, John H. Miller
{"title":"BPAGS: a web application for bacteriocin prediction via feature evaluation using alternating decision tree, genetic algorithm, and linear support vector classifier","authors":"Suraiya Akhter, John H. Miller","doi":"10.3389/fbinf.2023.1284705","DOIUrl":null,"url":null,"abstract":"The use of bacteriocins has emerged as a propitious strategy in the development of new drugs to combat antibiotic resistance, given their ability to kill bacteria with both broad and narrow natural spectra. Hence, a compelling requirement arises for a precise and efficient computational model that can accurately predict novel bacteriocins. Machine learning’s ability to learn patterns and features from bacteriocin sequences that are difficult to capture using sequence matching-based methods makes it a potentially superior choice for accurate prediction. A web application for predicting bacteriocin was created in this study, utilizing a machine learning approach. The feature sets employed in the application were chosen using alternating decision tree (ADTree), genetic algorithm (GA), and linear support vector classifier (linear SVC)-based feature evaluation methods. Initially, potential features were extracted from the physicochemical, structural, and sequence-profile attributes of both bacteriocin and non-bacteriocin protein sequences. We assessed the candidate features first using the Pearson correlation coefficient, followed by separate evaluations with ADTree, GA, and linear SVC to eliminate unnecessary features. Finally, we constructed random forest (RF), support vector machine (SVM), decision tree (DT), logistic regression (LR), k-nearest neighbors (KNN), and Gaussian naïve Bayes (GNB) models using reduced feature sets. We obtained the overall top performing model using SVM with ADTree-reduced features, achieving an accuracy of 99.11% and an AUC value of 0.9984 on the testing dataset. We also assessed the predictive capabilities of our best-performing models for each reduced feature set relative to our previously developed software solution, a sequence alignment-based tool, and a deep-learning approach. A web application, titled BPAGS (Bacteriocin Prediction based on ADTree, GA, and linear SVC), was developed to incorporate the predictive models built using ADTree, GA, and linear SVC-based feature sets. Currently, the web-based tool provides classification results with associated probability values and has options to add new samples in the training data to improve the predictive efficacy. BPAGS is freely accessible at https://shiny.tricities.wsu.edu/bacteriocin-prediction/.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":null,"pages":null},"PeriodicalIF":2.8000,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fbinf.2023.1284705","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

The use of bacteriocins has emerged as a propitious strategy in the development of new drugs to combat antibiotic resistance, given their ability to kill bacteria with both broad and narrow natural spectra. Hence, a compelling requirement arises for a precise and efficient computational model that can accurately predict novel bacteriocins. Machine learning’s ability to learn patterns and features from bacteriocin sequences that are difficult to capture using sequence matching-based methods makes it a potentially superior choice for accurate prediction. A web application for predicting bacteriocin was created in this study, utilizing a machine learning approach. The feature sets employed in the application were chosen using alternating decision tree (ADTree), genetic algorithm (GA), and linear support vector classifier (linear SVC)-based feature evaluation methods. Initially, potential features were extracted from the physicochemical, structural, and sequence-profile attributes of both bacteriocin and non-bacteriocin protein sequences. We assessed the candidate features first using the Pearson correlation coefficient, followed by separate evaluations with ADTree, GA, and linear SVC to eliminate unnecessary features. Finally, we constructed random forest (RF), support vector machine (SVM), decision tree (DT), logistic regression (LR), k-nearest neighbors (KNN), and Gaussian naïve Bayes (GNB) models using reduced feature sets. We obtained the overall top performing model using SVM with ADTree-reduced features, achieving an accuracy of 99.11% and an AUC value of 0.9984 on the testing dataset. We also assessed the predictive capabilities of our best-performing models for each reduced feature set relative to our previously developed software solution, a sequence alignment-based tool, and a deep-learning approach. A web application, titled BPAGS (Bacteriocin Prediction based on ADTree, GA, and linear SVC), was developed to incorporate the predictive models built using ADTree, GA, and linear SVC-based feature sets. Currently, the web-based tool provides classification results with associated probability values and has options to add new samples in the training data to improve the predictive efficacy. BPAGS is freely accessible at https://shiny.tricities.wsu.edu/bacteriocin-prediction/.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
BPAGS:利用交替决策树、遗传算法和线性支持向量分类器,通过特征评估进行细菌素预测的网络应用程序
细菌素具有宽窄两种自然光谱,能够杀死细菌,因此在开发对抗抗生素耐药性的新药时,细菌素的使用已成为一种有利的策略。因此,人们迫切要求建立一个精确、高效的计算模型,以准确预测新型细菌素。机器学习能够从细菌素序列中学习到序列匹配方法难以捕捉到的模式和特征,因此有可能成为准确预测的上佳选择。本研究利用机器学习方法创建了一个预测细菌素的网络应用程序。应用中使用的特征集是通过交替决策树(ADTree)、遗传算法(GA)和基于特征评估方法的线性支持向量分类器(linear SVC)选择的。最初,我们从细菌素和非细菌素蛋白质序列的理化、结构和序列剖面属性中提取潜在特征。我们首先使用皮尔逊相关系数对候选特征进行评估,然后使用 ADTree、GA 和线性 SVC 分别进行评估,以剔除不必要的特征。最后,我们利用减少的特征集构建了随机森林(RF)、支持向量机(SVM)、决策树(DT)、逻辑回归(LR)、k-近邻(KNN)和高斯天真贝叶斯(GNB)模型。我们使用带有 ADTree 缩减特征的 SVM 获得了整体表现最佳的模型,在测试数据集上达到了 99.11% 的准确率和 0.9984 的 AUC 值。我们还评估了相对于我们之前开发的软件解决方案、基于序列比对的工具和深度学习方法,我们针对每个特征集缩减的最佳表现模型的预测能力。我们开发了一个名为 BPAGS(基于 ADTree、GA 和线性 SVC 的细菌素预测)的网络应用程序,以整合使用基于 ADTree、GA 和线性 SVC 特征集建立的预测模型。目前,该基于网络的工具可提供带有相关概率值的分类结果,并有在训练数据中添加新样本以提高预测效果的选项。BPAGS 可在 https://shiny.tricities.wsu.edu/bacteriocin-prediction/ 免费访问。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
2.60
自引率
0.00%
发文量
0
期刊最新文献
The quantum hypercube as a k-mer graph. A review of model evaluation metrics for machine learning in genetics and genomics. Visual analysis of multi-omics data. Molecular docking and molecular dynamic simulation studies to identify potential terpenes against Internalin A protein of Listeria monocytogenes. PhIP-Seq: methods, applications and challenges.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1