Xiuhao Fu, Hao Duan, Xiaofeng Zang, Chunling Liu, Xingfeng Li, Qingchen Zhang, Zilong Zhang, Quan Zou, Feifei Cui
{"title":"Hyb_SEnc: An Antituberculosis Peptide Predictor Based on a Hybrid Feature Vector and Stacked Ensemble Learning.","authors":"Xiuhao Fu, Hao Duan, Xiaofeng Zang, Chunling Liu, Xingfeng Li, Qingchen Zhang, Zilong Zhang, Quan Zou, Feifei Cui","doi":"10.1109/TCBB.2024.3425644","DOIUrl":null,"url":null,"abstract":"<p><p>Tuberculosis has plagued mankind since ancient times, and the struggle between humans and tuberculosis continues. Mycobacterium tuberculosis is the leading cause of tuberculosis, infecting nearly one-third of the world's population. The rise of peptide drugs has created a new direction in the treatment of tuberculosis. Therefore, for the treatment of tuberculosis, the prediction of anti-tuberculosis peptides is crucial.This paper proposes an anti-tuberculosis peptide prediction method based on hybrid features and stacked ensemble learning. First, a random forest (RF) and extremely randomized tree (ERT) are selected as first-level learning of stacked ensembles. Then, the five best-performing feature encoding methods are selected to obtain the hybrid feature vector, and then the decision tree and recursive feature elimination (DT-RFE) are used to refine the hybrid feature vector. After selection, the optimal feature subset is used as the input of the stacked ensemble model. At the same time, logistic regression (LR) is used as a stacked ensemble secondary learner to build the final stacked ensemble model Hyb_SEnc. The prediction accuracy of Hyb_SEnc achieved 94.68% and 95.74% on the independent test sets of AntiTb_MD and AntiTb_RD, respectively. In addition, we provide a user-friendly Web server (http://www.bioailab. com/Hyb_SEnc). The source code is freely available at https://github.com/fxh1001/Hyb_SEnc.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/TCBB.2024.3425644","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Tuberculosis has plagued mankind since ancient times, and the struggle between humans and tuberculosis continues. Mycobacterium tuberculosis is the leading cause of tuberculosis, infecting nearly one-third of the world's population. The rise of peptide drugs has created a new direction in the treatment of tuberculosis. Therefore, for the treatment of tuberculosis, the prediction of anti-tuberculosis peptides is crucial.This paper proposes an anti-tuberculosis peptide prediction method based on hybrid features and stacked ensemble learning. First, a random forest (RF) and extremely randomized tree (ERT) are selected as first-level learning of stacked ensembles. Then, the five best-performing feature encoding methods are selected to obtain the hybrid feature vector, and then the decision tree and recursive feature elimination (DT-RFE) are used to refine the hybrid feature vector. After selection, the optimal feature subset is used as the input of the stacked ensemble model. At the same time, logistic regression (LR) is used as a stacked ensemble secondary learner to build the final stacked ensemble model Hyb_SEnc. The prediction accuracy of Hyb_SEnc achieved 94.68% and 95.74% on the independent test sets of AntiTb_MD and AntiTb_RD, respectively. In addition, we provide a user-friendly Web server (http://www.bioailab. com/Hyb_SEnc). The source code is freely available at https://github.com/fxh1001/Hyb_SEnc.
期刊介绍:
IEEE/ACM Transactions on Computational Biology and Bioinformatics emphasizes the algorithmic, mathematical, statistical and computational methods that are central in bioinformatics and computational biology; the development and testing of effective computer programs in bioinformatics; the development of biological databases; and important biological results that are obtained from the use of these methods, programs and databases; the emerging field of Systems Biology, where many forms of data are used to create a computer-based model of a complex biological system