基于计数矢量模型的人工智能web应用漏洞检测方法

IF 1.1 Q2 MATHEMATICS, APPLIED JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY Pub Date : 2022-10-03 DOI:10.1080/09720529.2022.2133243

K. Manjunatha, M. Kempanna

{"title":"基于计数矢量模型的人工智能web应用漏洞检测方法","authors":"K. Manjunatha, M. Kempanna","doi":"10.1080/09720529.2022.2133243","DOIUrl":null,"url":null,"abstract":"Abstract A web application is a dynamic, intricate, and interactive program that provides end-users with information and services such as utility payments, online communication, e-learning, socializing, shopping, online banking, and income tax filing etc. Web applications have become a major target for attackers due to their accessibility, availability, and ubiquity. Web application vulnerabilities are hazardous for some reasons. Attackers can harm an organizations image and status. The implementation flaws in web application allow the invader to infuse user-input that violates the syntax-based assembly of the query or infuse malicious code etc. Among various types of injection flaws, SQL injection (SQLI) is more prominent than (XML) both are considered as common application-layer web attack, which allows the attacker to bypass the security mechanisms therefore; these two are ranked as the most common vulnerabilities. Hence, a methodology for detecting evaluating both SQLI & XML vulnerabilities in web applications are considered for research. This research work addresses the above mentioned flaws and proposed an Ensemble Method to classify the Structure Query Language injection vulnerabilities, we selected a benchmark dataset with 33,758 rows containing; various types of SQL and XML injection attacks. Raw data is preprocessed to remove artifacts, and then feature engineering is performed using Natural Language Processing techniques to clean the data and extract 6 types of features such as TF-IDF, Word-to-Vector, SkipGram, Count Vectorizer, Glove and Continuous Bag of words. Imbalance data is handled using sampling techniques, best features are selected using 4 types of validation techniques Significant Test, PCA, Variance Threshold and Sbest. Prepared data is provided to Ensemble Model having two stages; Stage-2 accepts URL from the user and detects presence of susceptibility in the sub domains and domains. Stage-1 having 9 different types of machine learning models Multinomial, Gaussian, Bernoulli Naive Bayes, Logistic Regression, Decision Tree, Random Forest, AdaBoost, SVC with, poly, rbf and linear kernel, these models are trained on additional vectors such as google news and glove to detect the new query either SQL or XML for presences or absence of vulnerability, using this proposed ensemble approach obtained the accuracy of 99%.","PeriodicalId":46563,"journal":{"name":"JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY","volume":"25 1","pages":"2039 - 2048"},"PeriodicalIF":1.1000,"publicationDate":"2022-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Count vectorizer model based web application vulnerability detection using artificial intelligence approach\",\"authors\":\"K. Manjunatha, M. Kempanna\",\"doi\":\"10.1080/09720529.2022.2133243\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract A web application is a dynamic, intricate, and interactive program that provides end-users with information and services such as utility payments, online communication, e-learning, socializing, shopping, online banking, and income tax filing etc. Web applications have become a major target for attackers due to their accessibility, availability, and ubiquity. Web application vulnerabilities are hazardous for some reasons. Attackers can harm an organizations image and status. The implementation flaws in web application allow the invader to infuse user-input that violates the syntax-based assembly of the query or infuse malicious code etc. Among various types of injection flaws, SQL injection (SQLI) is more prominent than (XML) both are considered as common application-layer web attack, which allows the attacker to bypass the security mechanisms therefore; these two are ranked as the most common vulnerabilities. Hence, a methodology for detecting evaluating both SQLI & XML vulnerabilities in web applications are considered for research. This research work addresses the above mentioned flaws and proposed an Ensemble Method to classify the Structure Query Language injection vulnerabilities, we selected a benchmark dataset with 33,758 rows containing; various types of SQL and XML injection attacks. Raw data is preprocessed to remove artifacts, and then feature engineering is performed using Natural Language Processing techniques to clean the data and extract 6 types of features such as TF-IDF, Word-to-Vector, SkipGram, Count Vectorizer, Glove and Continuous Bag of words. Imbalance data is handled using sampling techniques, best features are selected using 4 types of validation techniques Significant Test, PCA, Variance Threshold and Sbest. Prepared data is provided to Ensemble Model having two stages; Stage-2 accepts URL from the user and detects presence of susceptibility in the sub domains and domains. Stage-1 having 9 different types of machine learning models Multinomial, Gaussian, Bernoulli Naive Bayes, Logistic Regression, Decision Tree, Random Forest, AdaBoost, SVC with, poly, rbf and linear kernel, these models are trained on additional vectors such as google news and glove to detect the new query either SQL or XML for presences or absence of vulnerability, using this proposed ensemble approach obtained the accuracy of 99%.\",\"PeriodicalId\":46563,\"journal\":{\"name\":\"JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY\",\"volume\":\"25 1\",\"pages\":\"2039 - 2048\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2022-10-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/09720529.2022.2133243\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICS, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/09720529.2022.2133243","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}

引用次数: 0

摘要

摘要web应用程序是一个动态、复杂和交互式的程序，它为最终用户提供信息和服务，如公用事业支付、在线通信、电子学习、社交、购物、网上银行和所得税申报等。由于其可访问性、可用性和普遍性，web应用程序已成为攻击者的主要目标。由于某些原因，Web应用程序漏洞是危险的。攻击者可能会损害组织的形象和地位。web应用程序中的实现缺陷允许入侵者注入违反查询基于语法的汇编的用户输入或注入恶意代码等。在各种类型的注入缺陷中，SQL注入（SQLI）比XML更突出，两者都被认为是常见的应用层web攻击，这使攻击者能够绕过安全机制；这两个漏洞被列为最常见的漏洞。因此，需要考虑一种检测和评估web应用程序中SQLI和XML漏洞的方法进行研究。本研究工作针对上述缺陷，提出了一种集成方法来对结构查询语言注入漏洞进行分类，我们选择了一个33758行的基准数据集；各种类型的SQL和XML注入攻击。对原始数据进行预处理以去除伪影，然后使用自然语言处理技术进行特征工程以清理数据并提取6种类型的特征，如TF-IDF、Word to Vector、SkippGram、Count Vectorizer、Glove和Continuous Bag of words。不平衡数据使用采样技术处理，最佳特征使用4种类型的验证技术显著性检验、主成分分析、方差阈值和Sbest进行选择。准备好的数据被提供给具有两个阶段的集合模型；阶段2接受来自用户的URL，并检测子域和域中是否存在易感性。阶段-1具有9种不同类型的机器学习模型多项式、高斯、伯努利-奈夫贝叶斯、逻辑回归、决策树、随机森林、AdaBoost、SVC，以及poly、rbf和线性内核，这些模型在谷歌新闻和手套等附加向量上进行训练，以检测SQL或XML的新查询是否存在漏洞，使用该集成方法获得了99%的准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Count vectorizer model based web application vulnerability detection using artificial intelligence approach

Abstract A web application is a dynamic, intricate, and interactive program that provides end-users with information and services such as utility payments, online communication, e-learning, socializing, shopping, online banking, and income tax filing etc. Web applications have become a major target for attackers due to their accessibility, availability, and ubiquity. Web application vulnerabilities are hazardous for some reasons. Attackers can harm an organizations image and status. The implementation flaws in web application allow the invader to infuse user-input that violates the syntax-based assembly of the query or infuse malicious code etc. Among various types of injection flaws, SQL injection (SQLI) is more prominent than (XML) both are considered as common application-layer web attack, which allows the attacker to bypass the security mechanisms therefore; these two are ranked as the most common vulnerabilities. Hence, a methodology for detecting evaluating both SQLI & XML vulnerabilities in web applications are considered for research. This research work addresses the above mentioned flaws and proposed an Ensemble Method to classify the Structure Query Language injection vulnerabilities, we selected a benchmark dataset with 33,758 rows containing; various types of SQL and XML injection attacks. Raw data is preprocessed to remove artifacts, and then feature engineering is performed using Natural Language Processing techniques to clean the data and extract 6 types of features such as TF-IDF, Word-to-Vector, SkipGram, Count Vectorizer, Glove and Continuous Bag of words. Imbalance data is handled using sampling techniques, best features are selected using 4 types of validation techniques Significant Test, PCA, Variance Threshold and Sbest. Prepared data is provided to Ensemble Model having two stages; Stage-2 accepts URL from the user and detects presence of susceptibility in the sub domains and domains. Stage-1 having 9 different types of machine learning models Multinomial, Gaussian, Bernoulli Naive Bayes, Logistic Regression, Decision Tree, Random Forest, AdaBoost, SVC with, poly, rbf and linear kernel, these models are trained on additional vectors such as google news and glove to detect the new query either SQL or XML for presences or absence of vulnerability, using this proposed ensemble approach obtained the accuracy of 99%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊