Software security with natural language processing and vulnerability scoring using machine learning approach

3区 计算机科学 Q1 Computer Science Journal of Ambient Intelligence and Humanized Computing Pub Date : 2024-04-03 DOI:10.1007/s12652-024-04778-y
Birendra Kumar Verma, Ajay Kumar Yadav
{"title":"Software security with natural language processing and vulnerability scoring using machine learning approach","authors":"Birendra Kumar Verma, Ajay Kumar Yadav","doi":"10.1007/s12652-024-04778-y","DOIUrl":null,"url":null,"abstract":"<p>As software gets more complicated, diverse, and crucial to people’s daily lives, exploitable software vulnerabilities constitute a major security risk to the computer system. These vulnerabilities allow unauthorized access, which can cause losses in banking, energy, the military, healthcare, and other key infrastructure systems. Most vulnerability scoring methods employ Natural Language Processing to generate models from descriptions. These models ignore Impact scores, Exploitability scores, Attack Complexity and other statistical features when scoring vulnerabilities. A feature vector for machine learning models is created from a description, impact score, exploitability score, attack complexity score, etc. We score vulnerabilities more precisely than we categorize them. The Decision Tree Regressor, Random Forest Regressor, AdaBoost Regressor, K-nearest Neighbors Regressor, and Support Vector Regressor have been evaluated using the metrics explained variance, r-squared, mean absolute error, mean squared error, and root mean squared error. The tenfold cross-validation method verifies regressor test results. The research uses 193,463 Common Vulnerabilities and Exposures from the National Vulnerability Database. The Random Forest regressor performed well on four of the five criteria, and the tenfold cross-validation test performed even better (0.9968 vs. 0.9958).</p>","PeriodicalId":14959,"journal":{"name":"Journal of Ambient Intelligence and Humanized Computing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Ambient Intelligence and Humanized Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s12652-024-04778-y","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0

Abstract

As software gets more complicated, diverse, and crucial to people’s daily lives, exploitable software vulnerabilities constitute a major security risk to the computer system. These vulnerabilities allow unauthorized access, which can cause losses in banking, energy, the military, healthcare, and other key infrastructure systems. Most vulnerability scoring methods employ Natural Language Processing to generate models from descriptions. These models ignore Impact scores, Exploitability scores, Attack Complexity and other statistical features when scoring vulnerabilities. A feature vector for machine learning models is created from a description, impact score, exploitability score, attack complexity score, etc. We score vulnerabilities more precisely than we categorize them. The Decision Tree Regressor, Random Forest Regressor, AdaBoost Regressor, K-nearest Neighbors Regressor, and Support Vector Regressor have been evaluated using the metrics explained variance, r-squared, mean absolute error, mean squared error, and root mean squared error. The tenfold cross-validation method verifies regressor test results. The research uses 193,463 Common Vulnerabilities and Exposures from the National Vulnerability Database. The Random Forest regressor performed well on four of the five criteria, and the tenfold cross-validation test performed even better (0.9968 vs. 0.9958).

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用自然语言处理和机器学习方法进行漏洞评分的软件安全性
随着软件变得越来越复杂、多样,而且对人们的日常生活越来越重要,可利用的软件漏洞对计算机系统构成了重大的安全风险。这些漏洞允许未经授权的访问,会给银行、能源、军事、医疗保健和其他关键基础设施系统造成损失。大多数漏洞评分方法都采用自然语言处理技术,从描述中生成模型。这些模型在对漏洞进行评分时会忽略影响得分、可开发性得分、攻击复杂性和其他统计特征。机器学习模型的特征向量由描述、影响得分、可利用性得分、攻击复杂性得分等创建。我们对漏洞的评分比对漏洞的分类更精确。使用解释方差、r 平方、平均绝对误差、平均平方误差和均方根误差等指标对决策树回归器、随机森林回归器、AdaBoost 回归器、K-近邻回归器和支持向量回归器进行了评估。十倍交叉验证法验证了回归器的测试结果。研究使用了国家脆弱性数据库中的 193,463 个常见脆弱性和暴露。随机森林回归器在五项标准中的四项上表现良好,十倍交叉验证测试的表现甚至更好(0.9968 对 0.9958)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Ambient Intelligence and Humanized Computing
Journal of Ambient Intelligence and Humanized Computing COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, INFORMATION SYSTEMS
CiteScore
9.60
自引率
0.00%
发文量
854
期刊介绍: The purpose of JAIHC is to provide a high profile, leading edge forum for academics, industrial professionals, educators and policy makers involved in the field to contribute, to disseminate the most innovative researches and developments of all aspects of ambient intelligence and humanized computing, such as intelligent/smart objects, environments/spaces, and systems. The journal discusses various technical, safety, personal, social, physical, political, artistic and economic issues. The research topics covered by the journal are (but not limited to): Pervasive/Ubiquitous Computing and Applications Cognitive wireless sensor network Embedded Systems and Software Mobile Computing and Wireless Communications Next Generation Multimedia Systems Security, Privacy and Trust Service and Semantic Computing Advanced Networking Architectures Dependable, Reliable and Autonomic Computing Embedded Smart Agents Context awareness, social sensing and inference Multi modal interaction design Ergonomics and product prototyping Intelligent and self-organizing transportation networks & services Healthcare Systems Virtual Humans & Virtual Worlds Wearables sensors and actuators
期刊最新文献
Predicting the unconfined compressive strength of stabilized soil using random forest coupled with meta-heuristic algorithms Expressive sign language system for deaf kids with MPEG-4 approach of virtual human character MEDCO: an efficient protocol for data compression in wireless body sensor network A multi-objective gene selection for cancer diagnosis using particle swarm optimization and mutual information Partial policy hidden medical data access control method based on CP-ABE
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1