赢得IOC游戏:走向自动发现和分析开源网络威胁情报

Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security Pub Date : 2016-10-24 DOI:10.1145/2976749.2978315

Xiaojing Liao, Kan Yuan, Xiaofeng Wang, Zhou Li, Luyi Xing, R. Beyah

{"title":"赢得IOC游戏:走向自动发现和分析开源网络威胁情报","authors":"Xiaojing Liao, Kan Yuan, Xiaofeng Wang, Zhou Li, Luyi Xing, R. Beyah","doi":"10.1145/2976749.2978315","DOIUrl":null,"url":null,"abstract":"To adapt to the rapidly evolving landscape of cyber threats, security professionals are actively exchanging Indicators of Compromise (IOC) (e.g., malware signatures, botnet IPs) through public sources (e.g. blogs, forums, tweets, etc.). Such information, often presented in articles, posts, white papers etc., can be converted into a machine-readable OpenIOC format for automatic analysis and quick deployment to various security mechanisms like an intrusion detection system. With hundreds of thousands of sources in the wild, the IOC data are produced at a high volume and velocity today, which becomes increasingly hard to manage by humans. Efforts to automatically gather such information from unstructured text, however, is impeded by the limitations of today's Natural Language Processing (NLP) techniques, which cannot meet the high standard (in terms of accuracy and coverage) expected from the IOCs that could serve as direct input to a defense system. In this paper, we present iACE, an innovation solution for fully automated IOC extraction. Our approach is based upon the observation that the IOCs in technical articles are often described in a predictable way: being connected to a set of context terms (e.g., \"download\") through stable grammatical relations. Leveraging this observation, iACE is designed to automatically locate a putative IOC token (e.g., a zip file) and its context (e.g., \"malware\", \"download\") within the sentences in a technical article, and further analyze their relations through a novel application of graph mining techniques. Once the grammatical connection between the tokens is found to be in line with the way that the IOC is commonly presented, these tokens are extracted to generate an OpenIOC item that describes not only the indicator (e.g., a malicious zip file) but also its context (e.g., download from an external source). Running on 71,000 articles collected from 45 leading technical blogs, this new approach demonstrates a remarkable performance: it generated 900K OpenIOC items with a precision of 95% and a coverage over 90%, which is way beyond what the state-of-the-art NLP technique and industry IOC tool can achieve, at a speed of thousands of articles per hour. Further, by correlating the IOCs mined from the articles published over a 13-year span, our study sheds new light on the links across hundreds of seemingly unrelated attack instances, particularly their shared infrastructure resources, as well as the impacts of such open-source threat intelligence on security protection and evolution of attack strategies.","PeriodicalId":432261,"journal":{"name":"Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"206","resultStr":"{\"title\":\"Acing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence\",\"authors\":\"Xiaojing Liao, Kan Yuan, Xiaofeng Wang, Zhou Li, Luyi Xing, R. Beyah\",\"doi\":\"10.1145/2976749.2978315\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To adapt to the rapidly evolving landscape of cyber threats, security professionals are actively exchanging Indicators of Compromise (IOC) (e.g., malware signatures, botnet IPs) through public sources (e.g. blogs, forums, tweets, etc.). Such information, often presented in articles, posts, white papers etc., can be converted into a machine-readable OpenIOC format for automatic analysis and quick deployment to various security mechanisms like an intrusion detection system. With hundreds of thousands of sources in the wild, the IOC data are produced at a high volume and velocity today, which becomes increasingly hard to manage by humans. Efforts to automatically gather such information from unstructured text, however, is impeded by the limitations of today's Natural Language Processing (NLP) techniques, which cannot meet the high standard (in terms of accuracy and coverage) expected from the IOCs that could serve as direct input to a defense system. In this paper, we present iACE, an innovation solution for fully automated IOC extraction. Our approach is based upon the observation that the IOCs in technical articles are often described in a predictable way: being connected to a set of context terms (e.g., \\\"download\\\") through stable grammatical relations. Leveraging this observation, iACE is designed to automatically locate a putative IOC token (e.g., a zip file) and its context (e.g., \\\"malware\\\", \\\"download\\\") within the sentences in a technical article, and further analyze their relations through a novel application of graph mining techniques. Once the grammatical connection between the tokens is found to be in line with the way that the IOC is commonly presented, these tokens are extracted to generate an OpenIOC item that describes not only the indicator (e.g., a malicious zip file) but also its context (e.g., download from an external source). Running on 71,000 articles collected from 45 leading technical blogs, this new approach demonstrates a remarkable performance: it generated 900K OpenIOC items with a precision of 95% and a coverage over 90%, which is way beyond what the state-of-the-art NLP technique and industry IOC tool can achieve, at a speed of thousands of articles per hour. Further, by correlating the IOCs mined from the articles published over a 13-year span, our study sheds new light on the links across hundreds of seemingly unrelated attack instances, particularly their shared infrastructure resources, as well as the impacts of such open-source threat intelligence on security protection and evolution of attack strategies.\",\"PeriodicalId\":432261,\"journal\":{\"name\":\"Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security\",\"volume\":\"64 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"206\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2976749.2978315\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2976749.2978315","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 206

摘要

为了适应快速发展的网络威胁，安全专业人员正在通过公共来源(如博客、论坛、推文等)积极交换入侵指标(IOC)(如恶意软件签名、僵尸网络ip)。这些信息通常出现在文章、帖子、白皮书等中，可以转换为机器可读的OpenIOC格式，以便自动分析和快速部署到入侵检测系统等各种安全机制中。在野外有成千上万的来源，如今IOC数据以高容量和高速度产生，这变得越来越难以由人类管理。然而，从非结构化文本中自动收集此类信息的努力受到当今自然语言处理(NLP)技术的限制，这些技术无法满足ioc所期望的高标准(在准确性和覆盖范围方面)，ioc可以作为防御系统的直接输入。在本文中，我们提出了iACE，一种全自动IOC提取的创新解决方案。我们的方法是基于这样一种观察:技术文章中的ioc通常以一种可预测的方式描述:通过稳定的语法关系连接到一组上下文术语(例如，“download”)。利用这种观察，iACE被设计为在技术文章的句子中自动定位假定的IOC令牌(例如，zip文件)及其上下文(例如，“恶意软件”，“下载”)，并通过图形挖掘技术的新应用进一步分析它们之间的关系。一旦发现令牌之间的语法连接与IOC通常表示的方式一致，就提取这些令牌以生成一个OpenIOC项，该项不仅描述指示符(例如，恶意zip文件)，还描述其上下文(例如，从外部源下载)。在从45个领先的技术博客中收集的71,000篇文章上运行，这种新方法展示了非凡的性能:它生成了900K个OpenIOC条目，精度为95%，覆盖率超过90%，这远远超出了最先进的NLP技术和行业IOC工具所能达到的速度，每小时数千篇文章。此外，通过将从13年发表的文章中挖掘的ioc进行关联，我们的研究揭示了数百个看似无关的攻击实例之间的联系，特别是它们共享的基础设施资源，以及此类开源威胁情报对安全保护和攻击策略演变的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Acing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

To adapt to the rapidly evolving landscape of cyber threats, security professionals are actively exchanging Indicators of Compromise (IOC) (e.g., malware signatures, botnet IPs) through public sources (e.g. blogs, forums, tweets, etc.). Such information, often presented in articles, posts, white papers etc., can be converted into a machine-readable OpenIOC format for automatic analysis and quick deployment to various security mechanisms like an intrusion detection system. With hundreds of thousands of sources in the wild, the IOC data are produced at a high volume and velocity today, which becomes increasingly hard to manage by humans. Efforts to automatically gather such information from unstructured text, however, is impeded by the limitations of today's Natural Language Processing (NLP) techniques, which cannot meet the high standard (in terms of accuracy and coverage) expected from the IOCs that could serve as direct input to a defense system. In this paper, we present iACE, an innovation solution for fully automated IOC extraction. Our approach is based upon the observation that the IOCs in technical articles are often described in a predictable way: being connected to a set of context terms (e.g., "download") through stable grammatical relations. Leveraging this observation, iACE is designed to automatically locate a putative IOC token (e.g., a zip file) and its context (e.g., "malware", "download") within the sentences in a technical article, and further analyze their relations through a novel application of graph mining techniques. Once the grammatical connection between the tokens is found to be in line with the way that the IOC is commonly presented, these tokens are extracted to generate an OpenIOC item that describes not only the indicator (e.g., a malicious zip file) but also its context (e.g., download from an external source). Running on 71,000 articles collected from 45 leading technical blogs, this new approach demonstrates a remarkable performance: it generated 900K OpenIOC items with a precision of 95% and a coverage over 90%, which is way beyond what the state-of-the-art NLP technique and industry IOC tool can achieve, at a speed of thousands of articles per hour. Further, by correlating the IOCs mined from the articles published over a 13-year span, our study sheds new light on the links across hundreds of seemingly unrelated attack instances, particularly their shared infrastructure resources, as well as the impacts of such open-source threat intelligence on security protection and evolution of attack strategies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security

自引率

0.00%

发文量

期刊最新文献

∑oφoς: Forward Secure Searchable Encryption Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition Message-Recovery Attacks on Feistel-Based Format Preserving Encryption iLock: Immediate and Automatic Locking of Mobile Devices against Data Theft Prefetch Side-Channel Attacks: Bypassing SMAP and Kernel ASLR