{"title":"AMP-EF:一个用于抗菌肽识别的极端梯度增强和双向长短期记忆网络集成框架","authors":"Shengli Zhang, Ya Zhao, Yunyun Liang","doi":"10.46793/match.91-1.109z","DOIUrl":null,"url":null,"abstract":"In recent years, bacterial resistance becomes a serious problem due to the abuse of antibiotics. Antimicrobial peptides (AMPs) have rapidly emerged as the best alternative to antibiotics because of their ability to rapidly target bacteria, fungi, viruses, and cancer cells and counteract the toxins they produce. In this study, a two-branch ensemble framework is proposed to identify AMPs, which integrates extreme gradient boosting (XGBoost) and bidirectional long short-term memory network (Bi-LSTM) with attention mechanism to form a stronger model. First, one-hot coding and -mer are used to represent the sequence features. Then, the feature vectors are input into the two base classifiers respectively to obtain two predicted values. Finally, the prediction results are obtained by compromise. As one of the classical machine learning methods, XGBoost has strong stability and can adapt to datasets of different sizes. Bi-LSTM recurses for each peptide from N-terminal to C-terminal and C-terminal to N-terminal, respectively. As the context information is provided, the model can make more accurate prediction. Our method achieves higher or highly comparable results across the eight independent test datasets. The ACC values of XUAMP, YADAMP, DRAMP, CAMP, LAMP, APD3, dbAMP, and DBAASP are 77.9%, 98.5%, 72.5%, 99.8%, 83.0%, 92.4%, 87.5%, and 84.6%, respectively. This shows that the two-branch ensemble structure is feasible and has strong generalization. The codes and datasets are accessible at https://github.com/z11code/AMP-EF.","PeriodicalId":51115,"journal":{"name":"Match-Communications in Mathematical and in Computer Chemistry","volume":null,"pages":null},"PeriodicalIF":2.9000,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AMP-EF: An Ensemble Framework of Extreme Gradient Boosting and Bidirectional Long Short-Term Memory Network for Identifying Antimicrobial Peptides\",\"authors\":\"Shengli Zhang, Ya Zhao, Yunyun Liang\",\"doi\":\"10.46793/match.91-1.109z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, bacterial resistance becomes a serious problem due to the abuse of antibiotics. Antimicrobial peptides (AMPs) have rapidly emerged as the best alternative to antibiotics because of their ability to rapidly target bacteria, fungi, viruses, and cancer cells and counteract the toxins they produce. In this study, a two-branch ensemble framework is proposed to identify AMPs, which integrates extreme gradient boosting (XGBoost) and bidirectional long short-term memory network (Bi-LSTM) with attention mechanism to form a stronger model. First, one-hot coding and -mer are used to represent the sequence features. Then, the feature vectors are input into the two base classifiers respectively to obtain two predicted values. Finally, the prediction results are obtained by compromise. As one of the classical machine learning methods, XGBoost has strong stability and can adapt to datasets of different sizes. Bi-LSTM recurses for each peptide from N-terminal to C-terminal and C-terminal to N-terminal, respectively. As the context information is provided, the model can make more accurate prediction. Our method achieves higher or highly comparable results across the eight independent test datasets. The ACC values of XUAMP, YADAMP, DRAMP, CAMP, LAMP, APD3, dbAMP, and DBAASP are 77.9%, 98.5%, 72.5%, 99.8%, 83.0%, 92.4%, 87.5%, and 84.6%, respectively. This shows that the two-branch ensemble structure is feasible and has strong generalization. The codes and datasets are accessible at https://github.com/z11code/AMP-EF.\",\"PeriodicalId\":51115,\"journal\":{\"name\":\"Match-Communications in Mathematical and in Computer Chemistry\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2023-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Match-Communications in Mathematical and in Computer Chemistry\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.46793/match.91-1.109z\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Match-Communications in Mathematical and in Computer Chemistry","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46793/match.91-1.109z","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
AMP-EF: An Ensemble Framework of Extreme Gradient Boosting and Bidirectional Long Short-Term Memory Network for Identifying Antimicrobial Peptides
In recent years, bacterial resistance becomes a serious problem due to the abuse of antibiotics. Antimicrobial peptides (AMPs) have rapidly emerged as the best alternative to antibiotics because of their ability to rapidly target bacteria, fungi, viruses, and cancer cells and counteract the toxins they produce. In this study, a two-branch ensemble framework is proposed to identify AMPs, which integrates extreme gradient boosting (XGBoost) and bidirectional long short-term memory network (Bi-LSTM) with attention mechanism to form a stronger model. First, one-hot coding and -mer are used to represent the sequence features. Then, the feature vectors are input into the two base classifiers respectively to obtain two predicted values. Finally, the prediction results are obtained by compromise. As one of the classical machine learning methods, XGBoost has strong stability and can adapt to datasets of different sizes. Bi-LSTM recurses for each peptide from N-terminal to C-terminal and C-terminal to N-terminal, respectively. As the context information is provided, the model can make more accurate prediction. Our method achieves higher or highly comparable results across the eight independent test datasets. The ACC values of XUAMP, YADAMP, DRAMP, CAMP, LAMP, APD3, dbAMP, and DBAASP are 77.9%, 98.5%, 72.5%, 99.8%, 83.0%, 92.4%, 87.5%, and 84.6%, respectively. This shows that the two-branch ensemble structure is feasible and has strong generalization. The codes and datasets are accessible at https://github.com/z11code/AMP-EF.
期刊介绍:
MATCH Communications in Mathematical and in Computer Chemistry publishes papers of original research as well as reviews on chemically important mathematical results and non-routine applications of mathematical techniques to chemical problems. A paper acceptable for publication must contain non-trivial mathematics or communicate non-routine computer-based procedures AND have a clear connection to chemistry. Papers are published without any processing or publication charge.