Text mining based an automatic model for software vulnerability severity prediction

IF 1.6 Q2 ENGINEERING, MULTIDISCIPLINARY International Journal of System Assurance Engineering and Management Pub Date : 2024-05-31 DOI:10.1007/s13198-024-02371-2
Ruchika Malhotra, Vidushi
{"title":"Text mining based an automatic model for software vulnerability severity prediction","authors":"Ruchika Malhotra, Vidushi","doi":"10.1007/s13198-024-02371-2","DOIUrl":null,"url":null,"abstract":"<p>Software vulnerabilities reported every year increase exponentially, leading to the exploitation of software systems. Hence, when a vulnerability is reported, a requirement arises to patch it as early as possible. Generally, this process requires some time and effort. For proper channelizing of the efforts, a requirement comes to predict the severity of the vulnerability so that the more critical ones can be given a higher priority. Therefore, a need arises to build a model that can analyze the data available on vulnerabilities and predict their severity. The experiment of this study is conducted on vulnerability reports of five software of Mozilla. As the data is textual, text mining techniques are applied to preprocess the data and form feature vectors. This input as text creates very high dimensional feature vectors leading to the requirement of dimensionality reduction. Hence, feature selection is done using chi-square and information gain. To develop the classifier, seven machine learning algorithms are chosen. Hence, fourteen software vulnerability severity prediction models (SVSPM) are developed. The result analysis allowed us to find the best-performing SVSPM. It is concluded that the model performed better for the medium and the critical severity level of the vulnerability. Out of the two feature selection techniques, information gain gave better results. An optimum number of features is also determined at which SVSPM gave good results. The best SVSPM using a machine learning algorithm corresponding to each dataset is found as well. A comparison is also made to identify significant differences among various SVSPMs developed using Friedman and Wilcoxon Signed Rank test.</p>","PeriodicalId":14463,"journal":{"name":"International Journal of System Assurance Engineering and Management","volume":"41 1","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of System Assurance Engineering and Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s13198-024-02371-2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Software vulnerabilities reported every year increase exponentially, leading to the exploitation of software systems. Hence, when a vulnerability is reported, a requirement arises to patch it as early as possible. Generally, this process requires some time and effort. For proper channelizing of the efforts, a requirement comes to predict the severity of the vulnerability so that the more critical ones can be given a higher priority. Therefore, a need arises to build a model that can analyze the data available on vulnerabilities and predict their severity. The experiment of this study is conducted on vulnerability reports of five software of Mozilla. As the data is textual, text mining techniques are applied to preprocess the data and form feature vectors. This input as text creates very high dimensional feature vectors leading to the requirement of dimensionality reduction. Hence, feature selection is done using chi-square and information gain. To develop the classifier, seven machine learning algorithms are chosen. Hence, fourteen software vulnerability severity prediction models (SVSPM) are developed. The result analysis allowed us to find the best-performing SVSPM. It is concluded that the model performed better for the medium and the critical severity level of the vulnerability. Out of the two feature selection techniques, information gain gave better results. An optimum number of features is also determined at which SVSPM gave good results. The best SVSPM using a machine learning algorithm corresponding to each dataset is found as well. A comparison is also made to identify significant differences among various SVSPMs developed using Friedman and Wilcoxon Signed Rank test.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于文本挖掘的软件漏洞严重性自动预测模型
每年报告的软件漏洞呈指数增长,导致软件系统被利用。因此,一旦有漏洞报告,就需要尽早修补。一般来说,这个过程需要一定的时间和精力。为了合理安排时间和精力,需要预测漏洞的严重性,以便优先处理更重要的漏洞。因此,需要建立一个能够分析现有漏洞数据并预测其严重性的模型。本研究的实验对象是 Mozilla 五款软件的漏洞报告。由于数据是文本数据,因此采用文本挖掘技术对数据进行预处理并形成特征向量。这种文本输入会产生非常高维的特征向量,因此需要降维。因此,特征选择使用了奇偶校验和信息增益。为了开发分类器,选择了七种机器学习算法。因此,我们开发了 14 个软件漏洞严重性预测模型(SVSPM)。通过结果分析,我们找到了表现最好的 SVSPM。结论是,该模型在中等和严重程度的漏洞中表现较好。在两种特征选择技术中,信息增益的结果更好。此外,还确定了 SVSPM 能取得良好结果的最佳特征数量。此外,还找到了与每个数据集相对应的使用机器学习算法的最佳 SVSPM。此外,还利用 Friedman 和 Wilcoxon Signed Rank 检验进行了比较,以确定所开发的各种 SVSPM 之间的显著差异。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
4.30
自引率
10.00%
发文量
252
期刊介绍: This Journal is established with a view to cater to increased awareness for high quality research in the seamless integration of heterogeneous technologies to formulate bankable solutions to the emergent complex engineering problems. Assurance engineering could be thought of as relating to the provision of higher confidence in the reliable and secure implementation of a system’s critical characteristic features through the espousal of a holistic approach by using a wide variety of cross disciplinary tools and techniques. Successful realization of sustainable and dependable products, systems and services involves an extensive adoption of Reliability, Quality, Safety and Risk related procedures for achieving high assurancelevels of performance; also pivotal are the management issues related to risk and uncertainty that govern the practical constraints encountered in their deployment. It is our intention to provide a platform for the modeling and analysis of large engineering systems, among the other aforementioned allied goals of systems assurance engineering, leading to the enforcement of performance enhancement measures. Achieving a fine balance between theory and practice is the primary focus. The Journal only publishes high quality papers that have passed the rigorous peer review procedure of an archival scientific Journal. The aim is an increasing number of submissions, wide circulation and a high impact factor.
期刊最新文献
Vision-based gait analysis to detect Parkinson’s disease using hybrid Harris hawks and Arithmetic optimization algorithm with Random Forest classifier Zero crossing point detection in a distorted sinusoidal signal using random forest classifier FL-XGBTC: federated learning inspired with XG-boost tuned classifier for YouTube spam content detection A generalized product adoption model under random marketing conditions Assessing e-learning platforms in higher education with reference to student satisfaction: a PLS-SEM approach
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1