{"title":"E-WebGuard: Enhanced neural architectures for precision web attack detection","authors":"Luchen Zhou , Wei-Chuen Yau , Y.S. Gan , Sze-Teng Liong","doi":"10.1016/j.cose.2024.104127","DOIUrl":null,"url":null,"abstract":"<div><div>Web applications have become a favored tool for organizations to disseminate vast amounts of information to the public. With the increasing adoption and inherent openness of these applications, there is an observed surge in web-based attacks exploited by adversaries. However, most of the web attack detection works are based on public datasets that are outdated or do not cover a sufficient quantity of web application attacks. Furthermore, most of them are binary detection (i.e., normal or attack) and there is little work on multi-class web attack detection. This highlights the crucial need for automated web attack detection models to bolster web security. In this study, a suite of integrated machine learning and deep learning models is designed to detect web attacks. Specifically, this study employs the Character-level Support Vector Machine (Char-SVM), Character-level Long Short-Term Memory (Char-LSTM), Convolutional Neural Network - SVM (CNN-SVM), and CNN-Bi-LSTM models to differentiate between standard HTTP requests and HTTP-based attacks in both the CSIC 2010 and SR-BH 2020 datasets. Note that the CSIC 2010 dataset involves binary classification, while the SR-BH 2020 dataset involves multi-class classification, specifically with 13 classes. Notably, the input data is first converted to the character level before being fed into any of the proposed model architectures. In the binary classification task, the Char-SVM model with a linear kernel outperforms other models, achieving an accuracy rate of 99.60%. The CNN-Bi-LSTM model closely follows with a 99.41% accuracy, surpassing the performance of the CNN-LSTM model presented in previous research. In the context of multi-class classification, the CNN-Bi-LSTM model demonstrates outstanding performance with a 99.63% accuracy rate. Furthermore, the multi-class classification models, namely Char-LSTM and CNN-Bi-LSTM, achieve validation accuracies above 98%, outperforming the two machine learning-based methods mentioned in the original research.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"148 ","pages":"Article 104127"},"PeriodicalIF":4.8000,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404824004322","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Web applications have become a favored tool for organizations to disseminate vast amounts of information to the public. With the increasing adoption and inherent openness of these applications, there is an observed surge in web-based attacks exploited by adversaries. However, most of the web attack detection works are based on public datasets that are outdated or do not cover a sufficient quantity of web application attacks. Furthermore, most of them are binary detection (i.e., normal or attack) and there is little work on multi-class web attack detection. This highlights the crucial need for automated web attack detection models to bolster web security. In this study, a suite of integrated machine learning and deep learning models is designed to detect web attacks. Specifically, this study employs the Character-level Support Vector Machine (Char-SVM), Character-level Long Short-Term Memory (Char-LSTM), Convolutional Neural Network - SVM (CNN-SVM), and CNN-Bi-LSTM models to differentiate between standard HTTP requests and HTTP-based attacks in both the CSIC 2010 and SR-BH 2020 datasets. Note that the CSIC 2010 dataset involves binary classification, while the SR-BH 2020 dataset involves multi-class classification, specifically with 13 classes. Notably, the input data is first converted to the character level before being fed into any of the proposed model architectures. In the binary classification task, the Char-SVM model with a linear kernel outperforms other models, achieving an accuracy rate of 99.60%. The CNN-Bi-LSTM model closely follows with a 99.41% accuracy, surpassing the performance of the CNN-LSTM model presented in previous research. In the context of multi-class classification, the CNN-Bi-LSTM model demonstrates outstanding performance with a 99.63% accuracy rate. Furthermore, the multi-class classification models, namely Char-LSTM and CNN-Bi-LSTM, achieve validation accuracies above 98%, outperforming the two machine learning-based methods mentioned in the original research.
期刊介绍:
Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world.
Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.