Text mining based an automatic model for software vulnerability severity prediction

IF 1.6 Q2 ENGINEERING, MULTIDISCIPLINARY International Journal of System Assurance Engineering and Management Pub Date : 2024-05-31 DOI:10.1007/s13198-024-02371-2

Ruchika Malhotra, Vidushi

{"title":"Text mining based an automatic model for software vulnerability severity prediction","authors":"Ruchika Malhotra, Vidushi","doi":"10.1007/s13198-024-02371-2","DOIUrl":null,"url":null,"abstract":"<p>Software vulnerabilities reported every year increase exponentially, leading to the exploitation of software systems. Hence, when a vulnerability is reported, a requirement arises to patch it as early as possible. Generally, this process requires some time and effort. For proper channelizing of the efforts, a requirement comes to predict the severity of the vulnerability so that the more critical ones can be given a higher priority. Therefore, a need arises to build a model that can analyze the data available on vulnerabilities and predict their severity. The experiment of this study is conducted on vulnerability reports of five software of Mozilla. As the data is textual, text mining techniques are applied to preprocess the data and form feature vectors. This input as text creates very high dimensional feature vectors leading to the requirement of dimensionality reduction. Hence, feature selection is done using chi-square and information gain. To develop the classifier, seven machine learning algorithms are chosen. Hence, fourteen software vulnerability severity prediction models (SVSPM) are developed. The result analysis allowed us to find the best-performing SVSPM. It is concluded that the model performed better for the medium and the critical severity level of the vulnerability. Out of the two feature selection techniques, information gain gave better results. An optimum number of features is also determined at which SVSPM gave good results. The best SVSPM using a machine learning algorithm corresponding to each dataset is found as well. A comparison is also made to identify significant differences among various SVSPMs developed using Friedman and Wilcoxon Signed Rank test.</p>","PeriodicalId":14463,"journal":{"name":"International Journal of System Assurance Engineering and Management","volume":"41 1","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of System Assurance Engineering and Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s13198-024-02371-2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Software vulnerabilities reported every year increase exponentially, leading to the exploitation of software systems. Hence, when a vulnerability is reported, a requirement arises to patch it as early as possible. Generally, this process requires some time and effort. For proper channelizing of the efforts, a requirement comes to predict the severity of the vulnerability so that the more critical ones can be given a higher priority. Therefore, a need arises to build a model that can analyze the data available on vulnerabilities and predict their severity. The experiment of this study is conducted on vulnerability reports of five software of Mozilla. As the data is textual, text mining techniques are applied to preprocess the data and form feature vectors. This input as text creates very high dimensional feature vectors leading to the requirement of dimensionality reduction. Hence, feature selection is done using chi-square and information gain. To develop the classifier, seven machine learning algorithms are chosen. Hence, fourteen software vulnerability severity prediction models (SVSPM) are developed. The result analysis allowed us to find the best-performing SVSPM. It is concluded that the model performed better for the medium and the critical severity level of the vulnerability. Out of the two feature selection techniques, information gain gave better results. An optimum number of features is also determined at which SVSPM gave good results. The best SVSPM using a machine learning algorithm corresponding to each dataset is found as well. A comparison is also made to identify significant differences among various SVSPMs developed using Friedman and Wilcoxon Signed Rank test.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于文本挖掘的软件漏洞严重性自动预测模型

每年报告的软件漏洞呈指数增长，导致软件系统被利用。因此，一旦有漏洞报告，就需要尽早修补。一般来说，这个过程需要一定的时间和精力。为了合理安排时间和精力，需要预测漏洞的严重性，以便优先处理更重要的漏洞。因此，需要建立一个能够分析现有漏洞数据并预测其严重性的模型。本研究的实验对象是 Mozilla 五款软件的漏洞报告。由于数据是文本数据，因此采用文本挖掘技术对数据进行预处理并形成特征向量。这种文本输入会产生非常高维的特征向量，因此需要降维。因此，特征选择使用了奇偶校验和信息增益。为了开发分类器，选择了七种机器学习算法。因此，我们开发了 14 个软件漏洞严重性预测模型（SVSPM）。通过结果分析，我们找到了表现最好的 SVSPM。结论是，该模型在中等和严重程度的漏洞中表现较好。在两种特征选择技术中，信息增益的结果更好。此外，还确定了 SVSPM 能取得良好结果的最佳特征数量。此外，还找到了与每个数据集相对应的使用机器学习算法的最佳 SVSPM。此外，还利用 Friedman 和 Wilcoxon Signed Rank 检验进行了比较，以确定所开发的各种 SVSPM 之间的显著差异。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Journal of System Assurance Engineering and Management ENGINEERING, MULTIDISCIPLINARY-

CiteScore

4.30

自引率

10.00%

发文量

252

期刊介绍： This Journal is established with a view to cater to increased awareness for high quality research in the seamless integration of heterogeneous technologies to formulate bankable solutions to the emergent complex engineering problems. Assurance engineering could be thought of as relating to the provision of higher confidence in the reliable and secure implementation of a system’s critical characteristic features through the espousal of a holistic approach by using a wide variety of cross disciplinary tools and techniques. Successful realization of sustainable and dependable products, systems and services involves an extensive adoption of Reliability, Quality, Safety and Risk related procedures for achieving high assurancelevels of performance; also pivotal are the management issues related to risk and uncertainty that govern the practical constraints encountered in their deployment. It is our intention to provide a platform for the modeling and analysis of large engineering systems, among the other aforementioned allied goals of systems assurance engineering, leading to the enforcement of performance enhancement measures. Achieving a fine balance between theory and practice is the primary focus. The Journal only publishes high quality papers that have passed the rigorous peer review procedure of an archival scientific Journal. The aim is an increasing number of submissions, wide circulation and a high impact factor.