Using Twitter to Predict When Vulnerabilities will be Exploited

Haipeng Chen, R. Liu, Noseong Park, V. S. Subrahmanian
{"title":"Using Twitter to Predict When Vulnerabilities will be Exploited","authors":"Haipeng Chen, R. Liu, Noseong Park, V. S. Subrahmanian","doi":"10.1145/3292500.3330742","DOIUrl":null,"url":null,"abstract":"When a new cyber-vulnerability is detected, a Common Vulnerability and Exposure (CVE) number is attached to it. Malicious \"exploits'' may use these vulnerabilities to carry out attacks. Unlike works which study if a CVE will be used in an exploit, we study the problem of predicting when an exploit is first seen. This is an important question for system administrators as they need to devote scarce resources to take corrective action when a new vulnerability emerges. Moreover, past works assume that CVSS scores (released by NIST) are available for predictions, but we show on average that 49% of real world exploits occur before CVSS scores are published. This means that past works, which use CVSS scores, miss almost half of the exploits. In this paper, we propose a novel framework to predict when a vulnerability will be exploited via Twitter discussion, without using CVSS score information. We introduce the unique concept of a family of CVE-Author-Tweet (CAT) graphs and build a novel set of features based on such graphs. We define recurrence relations capturing \"hotness\" of tweets, \"expertise\" of Twitter users on CVEs, and \"availability\" of information about CVEs, and prove that we can solve these recurrences via a fix point algorithm. Our second innovation adopts Hawkes processes to estimate the number of tweets/retweets related to the CVEs. Using the above two sets of novel features, we propose two ensemble forecast models FEEU (for classification) and FRET (for regression) to predict when a CVE will be exploited. Compared with natural adaptations of past works (which predict if an exploit will be used), FEEU increases F1 score by 25.1%, while FRET decreases MAE by 37.2%.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3292500.3330742","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 30

Abstract

When a new cyber-vulnerability is detected, a Common Vulnerability and Exposure (CVE) number is attached to it. Malicious "exploits'' may use these vulnerabilities to carry out attacks. Unlike works which study if a CVE will be used in an exploit, we study the problem of predicting when an exploit is first seen. This is an important question for system administrators as they need to devote scarce resources to take corrective action when a new vulnerability emerges. Moreover, past works assume that CVSS scores (released by NIST) are available for predictions, but we show on average that 49% of real world exploits occur before CVSS scores are published. This means that past works, which use CVSS scores, miss almost half of the exploits. In this paper, we propose a novel framework to predict when a vulnerability will be exploited via Twitter discussion, without using CVSS score information. We introduce the unique concept of a family of CVE-Author-Tweet (CAT) graphs and build a novel set of features based on such graphs. We define recurrence relations capturing "hotness" of tweets, "expertise" of Twitter users on CVEs, and "availability" of information about CVEs, and prove that we can solve these recurrences via a fix point algorithm. Our second innovation adopts Hawkes processes to estimate the number of tweets/retweets related to the CVEs. Using the above two sets of novel features, we propose two ensemble forecast models FEEU (for classification) and FRET (for regression) to predict when a CVE will be exploited. Compared with natural adaptations of past works (which predict if an exploit will be used), FEEU increases F1 score by 25.1%, while FRET decreases MAE by 37.2%.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用Twitter预测漏洞何时会被利用
当检测到新的网络漏洞时,系统会为其附加一个CVE (Common Vulnerability and Exposure)编号。恶意的“漏洞利用”可能会利用这些漏洞进行攻击。与研究CVE是否会被用于攻击的工作不同,我们研究的是预测攻击何时首次被发现的问题。对于系统管理员来说,这是一个重要的问题,因为当出现新的漏洞时,他们需要投入稀缺的资源来采取纠正措施。此外,过去的工作假设CVSS分数(由NIST发布)可用于预测,但我们显示,平均49%的现实世界漏洞利用发生在CVSS分数公布之前。这意味着过去使用CVSS分数的作品几乎错过了一半的漏洞。在本文中,我们提出了一个新的框架来预测何时漏洞将通过Twitter讨论被利用,而不使用CVSS评分信息。我们引入了CVE-Author-Tweet (CAT)图族的独特概念,并基于这些图构建了一组新的特征。我们定义了捕获tweet的“热度”、Twitter用户对cve的“专业度”和cve信息的“可用性”的递归关系,并证明了我们可以通过不动点算法求解这些递归关系。我们的第二个创新采用霍克斯流程来估计与cve相关的推文/转发数量。利用上述两组新特征,我们提出了两个集成预测模型FEEU(用于分类)和FRET(用于回归)来预测CVE何时会被利用。与过去作品的自然改编(预测漏洞是否会被利用)相比,FEEU使F1得分提高了25.1%,而FRET使MAE得分降低了37.2%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Tackle Balancing Constraint for Incremental Semi-Supervised Support Vector Learning HATS Temporal Probabilistic Profiles for Sepsis Prediction in the ICU Large-scale User Visits Understanding and Forecasting with Deep Spatial-Temporal Tensor Factorization Framework Adaptive Influence Maximization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1