Automated unearthing of dangerous issue reports

软件产业与工程 Pub Date : 2022-11-07 DOI:10.1145/3540250.3549156

Shengyi Pan, Jiayuan Zhou, F. R. Côgo, Xin Xia, Lingfeng Bao, Xing Hu, Shanping Li, Ahmed E. Hassan

{"title":"Automated unearthing of dangerous issue reports","authors":"Shengyi Pan, Jiayuan Zhou, F. R. Côgo, Xin Xia, Lingfeng Bao, Xing Hu, Shanping Li, Ahmed E. Hassan","doi":"10.1145/3540250.3549156","DOIUrl":null,"url":null,"abstract":"The coordinated vulnerability disclosure (CVD) process is commonly adopted for open source software (OSS) vulnerability management, which suggests to privately report the discovered vulnerabilities and keep relevant information secret until the official disclosure. However, in practice, due to various reasons (e.g., lacking security domain expertise or the sense of security management), many vulnerabilities are first reported via public issue reports (IRs) before its official disclosure. Such IRs are dangerous IRs, since attackers can take advantages of the leaked vulnerability information to launch zero-day attacks. It is crucial to identify such dangerous IRs at an early stage, such that OSS users can start the vulnerability remediation process earlier and OSS maintainers can timely manage the dangerous IRs. In this paper, we propose and evaluate a deep learning based approach, namely MemVul, to automatically identify dangerous IRs at the time they are reported. MemVul augments the neural networks with a memory component, which stores the external vulnerability knowledge from Common Weakness Enumeration (CWE). We rely on publicly accessible CVE-referred IRs (CIRs) to operationalize the concept of dangerous IR. We mine 3,937 CIRs distributed across 1,390 OSS projects hosted on GitHub. Evaluated under a practical scenario of high data imbalance, MemVul achieves the best trade-off between precision and recall among all baselines. In particular, the F1-score of MemVul (i.e., 0.49) improves the best performing baseline by 44%. For IRs that are predicted as CIRs but not reported to CVE, we conduct a user study to investigate their usefulness to OSS stakeholders. We observe that 82% (41 out of 50) of these IRs are security-related and 28 of them are suggested by security experts to be publicly disclosed, indicating MemVul is capable of identifying undisclosed dangerous IRs.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"软件产业与工程","FirstCategoryId":"1089","ListUrlMain":"https://doi.org/10.1145/3540250.3549156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

The coordinated vulnerability disclosure (CVD) process is commonly adopted for open source software (OSS) vulnerability management, which suggests to privately report the discovered vulnerabilities and keep relevant information secret until the official disclosure. However, in practice, due to various reasons (e.g., lacking security domain expertise or the sense of security management), many vulnerabilities are first reported via public issue reports (IRs) before its official disclosure. Such IRs are dangerous IRs, since attackers can take advantages of the leaked vulnerability information to launch zero-day attacks. It is crucial to identify such dangerous IRs at an early stage, such that OSS users can start the vulnerability remediation process earlier and OSS maintainers can timely manage the dangerous IRs. In this paper, we propose and evaluate a deep learning based approach, namely MemVul, to automatically identify dangerous IRs at the time they are reported. MemVul augments the neural networks with a memory component, which stores the external vulnerability knowledge from Common Weakness Enumeration (CWE). We rely on publicly accessible CVE-referred IRs (CIRs) to operationalize the concept of dangerous IR. We mine 3,937 CIRs distributed across 1,390 OSS projects hosted on GitHub. Evaluated under a practical scenario of high data imbalance, MemVul achieves the best trade-off between precision and recall among all baselines. In particular, the F1-score of MemVul (i.e., 0.49) improves the best performing baseline by 44%. For IRs that are predicted as CIRs but not reported to CVE, we conduct a user study to investigate their usefulness to OSS stakeholders. We observe that 82% (41 out of 50) of these IRs are security-related and 28 of them are suggested by security experts to be publicly disclosed, indicating MemVul is capable of identifying undisclosed dangerous IRs.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

自动发现危险问题报告

开源软件(OSS)漏洞管理通常采用协调漏洞披露(CVD)流程，建议对发现的漏洞进行私下报告，并对相关信息保密，直至正式披露。然而，在实践中，由于各种原因(如缺乏安全领域的专业知识或安全管理意识)，许多漏洞在正式披露之前，首先通过公共问题报告(public issue report, IRs)报告。这样的ir是危险的ir，因为攻击者可以利用泄露的漏洞信息发起零日攻击。在早期阶段识别这些危险的ir是至关重要的，这样OSS用户可以更早地启动漏洞修复过程，OSS维护者可以及时管理危险的ir。在本文中，我们提出并评估了一种基于深度学习的方法，即MemVul，在报告危险ir时自动识别危险ir。MemVul在神经网络的基础上增加了一个内存组件，该组件存储来自共同弱点枚举(Common Weakness Enumeration, CWE)的外部漏洞知识。我们依靠可公开访问的cve引用IR (CIRs)来实现危险IR的概念。我们在GitHub上托管的1390个OSS项目中挖掘了3937个cir。在高度数据不平衡的实际场景下进行评估，MemVul在所有基线中实现了精度和召回率之间的最佳权衡。特别是，MemVul的f1得分(即0.49)将最佳性能基线提高了44%。对于预测为cir但未向CVE报告的ir，我们进行用户研究以调查它们对OSS涉众的有用性。我们观察到，这些ir中有82%(50个中的41个)与安全相关，其中28个由安全专家建议公开披露，这表明MemVul能够识别未公开的危险ir。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

软件产业与工程

自引率

0.00%

发文量

676