Learning to Identify Security-Related Issues Using Convolutional Neural Networks

2019 IEEE International Conference on Software Maintenance and Evolution (ICSME) Pub Date : 2019-08-01 DOI:10.1109/ICSME.2019.00024

David N. Palacio, Daniel McCrystal, Kevin Moran, Carlos Bernal-Cárdenas, D. Poshyvanyk, Chris Shenefiel

{"title":"Learning to Identify Security-Related Issues Using Convolutional Neural Networks","authors":"David N. Palacio, Daniel McCrystal, Kevin Moran, Carlos Bernal-Cárdenas, D. Poshyvanyk, Chris Shenefiel","doi":"10.1109/ICSME.2019.00024","DOIUrl":null,"url":null,"abstract":"Software security is becoming a high priority for both large companies and start-ups alike due to the increasing potential for harm that vulnerabilities and breaches carry with them. However, attaining robust security assurance while delivering features requires a precarious balancing act in the context of agile development practices. One path forward to help aid development teams in securing their software products is through the design and development of security-focused automation. Ergo, we present a novel approach, called SecureReqNet, for automatically identifying whether issues in software issue tracking systems describe security-related content. Our approach consists of a two-phase neural net architecture that operates purely on the natural language descriptions of issues. The first phase of our approach learns high dimensional word embeddings from hundreds of thousands of vulnerability descriptions listed in the CVE database and issue descriptions extracted from open source projects. The second phase then utilizes the semantic ontology represented by these embeddings to train a convolutional neural network capable of predicting whether a given issue is security-related. We evaluated SecureReqNet by applying it to identify security-related issues from a dataset of thousands of issues mined from popular projects on GitLab and GitHub. In addition, we also applied our approach to identify security-related requirements from a commercial software project developed by a major telecommunication company. Our preliminary results are encouraging, with SecureReqNet achieving an accuracy of 96% on open source issues and 71.6% on industrial requirements.","PeriodicalId":106748,"journal":{"name":"2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSME.2019.00024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

Software security is becoming a high priority for both large companies and start-ups alike due to the increasing potential for harm that vulnerabilities and breaches carry with them. However, attaining robust security assurance while delivering features requires a precarious balancing act in the context of agile development practices. One path forward to help aid development teams in securing their software products is through the design and development of security-focused automation. Ergo, we present a novel approach, called SecureReqNet, for automatically identifying whether issues in software issue tracking systems describe security-related content. Our approach consists of a two-phase neural net architecture that operates purely on the natural language descriptions of issues. The first phase of our approach learns high dimensional word embeddings from hundreds of thousands of vulnerability descriptions listed in the CVE database and issue descriptions extracted from open source projects. The second phase then utilizes the semantic ontology represented by these embeddings to train a convolutional neural network capable of predicting whether a given issue is security-related. We evaluated SecureReqNet by applying it to identify security-related issues from a dataset of thousands of issues mined from popular projects on GitLab and GitHub. In addition, we also applied our approach to identify security-related requirements from a commercial software project developed by a major telecommunication company. Our preliminary results are encouraging, with SecureReqNet achieving an accuracy of 96% on open source issues and 71.6% on industrial requirements.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

学习使用卷积神经网络识别安全相关问题

软件安全正成为大公司和初创公司的首要任务，因为漏洞和破坏带来的危害越来越大。然而，在交付特性的同时获得健壮的安全保证需要在敏捷开发实践的上下文中进行不稳定的平衡。帮助开发团队保护其软件产品的一个途径是通过设计和开发以安全为重点的自动化。因此，我们提出了一种新的方法，称为SecureReqNet，用于自动识别软件问题跟踪系统中的问题是否描述了与安全相关的内容。我们的方法由一个两阶段的神经网络架构组成，该架构纯粹基于问题的自然语言描述。我们方法的第一阶段从CVE数据库中列出的数十万个漏洞描述和从开源项目中提取的问题描述中学习高维词嵌入。然后，第二阶段利用这些嵌入表示的语义本体来训练卷积神经网络，该网络能够预测给定问题是否与安全相关。我们通过将SecureReqNet应用于从GitLab和GitHub上的热门项目中挖掘的数千个问题的数据集中识别与安全相关的问题来评估它。此外，我们还应用我们的方法来识别由一家大型电信公司开发的商业软件项目的安全相关需求。我们的初步结果令人鼓舞，SecureReqNet在开源问题上达到了96%的准确率，在工业需求上达到了71.6%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊