{"title":"Text mining based an automatic model for software vulnerability severity prediction","authors":"Ruchika Malhotra, Vidushi","doi":"10.1007/s13198-024-02371-2","DOIUrl":null,"url":null,"abstract":"<p>Software vulnerabilities reported every year increase exponentially, leading to the exploitation of software systems. Hence, when a vulnerability is reported, a requirement arises to patch it as early as possible. Generally, this process requires some time and effort. For proper channelizing of the efforts, a requirement comes to predict the severity of the vulnerability so that the more critical ones can be given a higher priority. Therefore, a need arises to build a model that can analyze the data available on vulnerabilities and predict their severity. The experiment of this study is conducted on vulnerability reports of five software of Mozilla. As the data is textual, text mining techniques are applied to preprocess the data and form feature vectors. This input as text creates very high dimensional feature vectors leading to the requirement of dimensionality reduction. Hence, feature selection is done using chi-square and information gain. To develop the classifier, seven machine learning algorithms are chosen. Hence, fourteen software vulnerability severity prediction models (SVSPM) are developed. The result analysis allowed us to find the best-performing SVSPM. It is concluded that the model performed better for the medium and the critical severity level of the vulnerability. Out of the two feature selection techniques, information gain gave better results. An optimum number of features is also determined at which SVSPM gave good results. The best SVSPM using a machine learning algorithm corresponding to each dataset is found as well. A comparison is also made to identify significant differences among various SVSPMs developed using Friedman and Wilcoxon Signed Rank test.</p>","PeriodicalId":14463,"journal":{"name":"International Journal of System Assurance Engineering and Management","volume":"41 1","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of System Assurance Engineering and Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s13198-024-02371-2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Software vulnerabilities reported every year increase exponentially, leading to the exploitation of software systems. Hence, when a vulnerability is reported, a requirement arises to patch it as early as possible. Generally, this process requires some time and effort. For proper channelizing of the efforts, a requirement comes to predict the severity of the vulnerability so that the more critical ones can be given a higher priority. Therefore, a need arises to build a model that can analyze the data available on vulnerabilities and predict their severity. The experiment of this study is conducted on vulnerability reports of five software of Mozilla. As the data is textual, text mining techniques are applied to preprocess the data and form feature vectors. This input as text creates very high dimensional feature vectors leading to the requirement of dimensionality reduction. Hence, feature selection is done using chi-square and information gain. To develop the classifier, seven machine learning algorithms are chosen. Hence, fourteen software vulnerability severity prediction models (SVSPM) are developed. The result analysis allowed us to find the best-performing SVSPM. It is concluded that the model performed better for the medium and the critical severity level of the vulnerability. Out of the two feature selection techniques, information gain gave better results. An optimum number of features is also determined at which SVSPM gave good results. The best SVSPM using a machine learning algorithm corresponding to each dataset is found as well. A comparison is also made to identify significant differences among various SVSPMs developed using Friedman and Wilcoxon Signed Rank test.
期刊介绍:
This Journal is established with a view to cater to increased awareness for high quality research in the seamless integration of heterogeneous technologies to formulate bankable solutions to the emergent complex engineering problems.
Assurance engineering could be thought of as relating to the provision of higher confidence in the reliable and secure implementation of a system’s critical characteristic features through the espousal of a holistic approach by using a wide variety of cross disciplinary tools and techniques. Successful realization of sustainable and dependable products, systems and services involves an extensive adoption of Reliability, Quality, Safety and Risk related procedures for achieving high assurancelevels of performance; also pivotal are the management issues related to risk and uncertainty that govern the practical constraints encountered in their deployment. It is our intention to provide a platform for the modeling and analysis of large engineering systems, among the other aforementioned allied goals of systems assurance engineering, leading to the enforcement of performance enhancement measures. Achieving a fine balance between theory and practice is the primary focus. The Journal only publishes high quality papers that have passed the rigorous peer review procedure of an archival scientific Journal. The aim is an increasing number of submissions, wide circulation and a high impact factor.