从非结构化文本中挖掘网络威胁情报的文献综述

2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI:10.1109/ICDMW51313.2020.00075

Md. Rayhanur Rahman, Rezvan Mahdavi-Hezaveh, L. Williams

{"title":"从非结构化文本中挖掘网络威胁情报的文献综述","authors":"Md. Rayhanur Rahman, Rezvan Mahdavi-Hezaveh, L. Williams","doi":"10.1109/ICDMW51313.2020.00075","DOIUrl":null,"url":null,"abstract":"Cyberthreat defense mechanisms have become more proactive these days, and thus leading to the increasing incorporation of cyberthreat intelligence (CTI). Cybersecurity researchers and vendors are powering the CTI with large volumes of unstructured textual data containing information on threat events, threat techniques, and tactics. Hence, extracting cyberthreat-relevant information through text mining is an effective way to obtain actionable CTI to thwart cyberattacks. The goal of this research is to aid cybersecurity researchers understand the source, purpose, and approaches for mining cyberthreat intelligence from unstructured text through a literature review of peer-reviewed studies on this topic. We perform a literature review to identify and analyze existing research on mining CTI. By using search queries in the bibliographic databases, 28,484 articles are found. From those, 38 studies are identified through the filtering criteria which include removing duplicates, non-English, non-peer-reviewed articles, and articles not about mining CTI. We find that the most prominent sources of unstructured threat data are the threat reports, Twitter feeds, and posts from hackers and security experts. We also observe that security researchers mined CTI from unstructured sources to extract Indicator of Compromise (IoC), threat-related topic, and event detection. Finally, natural language processing (NLP) based approaches: topic classification; keyword identification; and semantic relationship extraction among the keywords are mostly availed in the selected studies to mine CTI information from unstructured threat sources.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"A Literature Review on Mining Cyberthreat Intelligence from Unstructured Texts\",\"authors\":\"Md. Rayhanur Rahman, Rezvan Mahdavi-Hezaveh, L. Williams\",\"doi\":\"10.1109/ICDMW51313.2020.00075\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cyberthreat defense mechanisms have become more proactive these days, and thus leading to the increasing incorporation of cyberthreat intelligence (CTI). Cybersecurity researchers and vendors are powering the CTI with large volumes of unstructured textual data containing information on threat events, threat techniques, and tactics. Hence, extracting cyberthreat-relevant information through text mining is an effective way to obtain actionable CTI to thwart cyberattacks. The goal of this research is to aid cybersecurity researchers understand the source, purpose, and approaches for mining cyberthreat intelligence from unstructured text through a literature review of peer-reviewed studies on this topic. We perform a literature review to identify and analyze existing research on mining CTI. By using search queries in the bibliographic databases, 28,484 articles are found. From those, 38 studies are identified through the filtering criteria which include removing duplicates, non-English, non-peer-reviewed articles, and articles not about mining CTI. We find that the most prominent sources of unstructured threat data are the threat reports, Twitter feeds, and posts from hackers and security experts. We also observe that security researchers mined CTI from unstructured sources to extract Indicator of Compromise (IoC), threat-related topic, and event detection. Finally, natural language processing (NLP) based approaches: topic classification; keyword identification; and semantic relationship extraction among the keywords are mostly availed in the selected studies to mine CTI information from unstructured threat sources.\",\"PeriodicalId\":426846,\"journal\":{\"name\":\"2020 International Conference on Data Mining Workshops (ICDMW)\",\"volume\":\"103 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on Data Mining Workshops (ICDMW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDMW51313.2020.00075\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW51313.2020.00075","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

如今，网络威胁防御机制变得更加主动，从而导致越来越多地纳入网络威胁情报(CTI)。网络安全研究人员和供应商正在为CTI提供大量非结构化文本数据，其中包含有关威胁事件、威胁技术和策略的信息。因此，通过文本挖掘提取网络威胁相关信息是获得可操作的CTI以阻止网络攻击的有效途径。本研究的目的是通过对该主题的同行评审研究的文献综述，帮助网络安全研究人员了解从非结构化文本中挖掘网络威胁情报的来源、目的和方法。我们进行了文献综述，以识别和分析现有的研究挖掘CTI。通过对文献数据库的检索，共检索到28,484篇文献。从中，通过过滤标准确定了38项研究，其中包括删除重复，非英语，非同行评审的文章以及与CTI无关的文章。我们发现，非结构化威胁数据最主要的来源是来自黑客和安全专家的威胁报告、Twitter feed和帖子。我们还观察到，安全研究人员从非结构化来源挖掘CTI，以提取妥协指标(IoC)、威胁相关主题和事件检测。最后，基于自然语言处理(NLP)的方法:主题分类;关键字识别;所选研究主要利用关键字之间的语义关系提取，从非结构化威胁源中挖掘CTI信息。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Literature Review on Mining Cyberthreat Intelligence from Unstructured Texts

Cyberthreat defense mechanisms have become more proactive these days, and thus leading to the increasing incorporation of cyberthreat intelligence (CTI). Cybersecurity researchers and vendors are powering the CTI with large volumes of unstructured textual data containing information on threat events, threat techniques, and tactics. Hence, extracting cyberthreat-relevant information through text mining is an effective way to obtain actionable CTI to thwart cyberattacks. The goal of this research is to aid cybersecurity researchers understand the source, purpose, and approaches for mining cyberthreat intelligence from unstructured text through a literature review of peer-reviewed studies on this topic. We perform a literature review to identify and analyze existing research on mining CTI. By using search queries in the bibliographic databases, 28,484 articles are found. From those, 38 studies are identified through the filtering criteria which include removing duplicates, non-English, non-peer-reviewed articles, and articles not about mining CTI. We find that the most prominent sources of unstructured threat data are the threat reports, Twitter feeds, and posts from hackers and security experts. We also observe that security researchers mined CTI from unstructured sources to extract Indicator of Compromise (IoC), threat-related topic, and event detection. Finally, natural language processing (NLP) based approaches: topic classification; keyword identification; and semantic relationship extraction among the keywords are mostly availed in the selected studies to mine CTI information from unstructured threat sources.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助