Improving Automated Labeling for ATT&CK Tactics in Malware Threat Reports

Eva Domschot, Ramyaa Ramyaa, Michael R. Smith
{"title":"Improving Automated Labeling for ATT&CK Tactics in Malware Threat Reports","authors":"Eva Domschot, Ramyaa Ramyaa, Michael R. Smith","doi":"10.1145/3594553","DOIUrl":null,"url":null,"abstract":"Once novel malware is detected, threat reports are written by security companies that discover it. The reports often vary in the terminology describing the behavior of the malware making comparisons of reports of the same malware from different companies difficult. To aid in the automated discovery of novel malware, it was recently proposed that novel malware could be detected by identifying behaviors. This assumes that a core set of behaviors are present in most, if not all, malware variants. However, there is a lack of malware datasets that are labeled with behaviors. Motivated by a need to label malware with a common set of behaviors, this work examines automating the process of labeling malware with behaviors identified in malware threat reports despite the variability of terminology. To do so, we examine several techniques from the natural language processing (NLP) domain. We find that most state-of-the-art word embedding NLP methods require large amounts of data and are trained on generic corpora of text data—missing the nuances related to information security. To address this, we use simple feature selection techniques. We find that simple feature selection techniques generally outperform word embedding methods and achieve an increase of 6% in the F.5-score over prior work when used to predict MITRE ATT&CK tactics in threat reports. Our work indicates that feature selection, which has commonly been overlooked by sophisticated methods in NLP tasks, is beneficial for information security related tasks, where more sophisticated NLP methodologies are not able to pick out relevant information security terms.","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Threats: Research and Practice","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3594553","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Once novel malware is detected, threat reports are written by security companies that discover it. The reports often vary in the terminology describing the behavior of the malware making comparisons of reports of the same malware from different companies difficult. To aid in the automated discovery of novel malware, it was recently proposed that novel malware could be detected by identifying behaviors. This assumes that a core set of behaviors are present in most, if not all, malware variants. However, there is a lack of malware datasets that are labeled with behaviors. Motivated by a need to label malware with a common set of behaviors, this work examines automating the process of labeling malware with behaviors identified in malware threat reports despite the variability of terminology. To do so, we examine several techniques from the natural language processing (NLP) domain. We find that most state-of-the-art word embedding NLP methods require large amounts of data and are trained on generic corpora of text data—missing the nuances related to information security. To address this, we use simple feature selection techniques. We find that simple feature selection techniques generally outperform word embedding methods and achieve an increase of 6% in the F.5-score over prior work when used to predict MITRE ATT&CK tactics in threat reports. Our work indicates that feature selection, which has commonly been overlooked by sophisticated methods in NLP tasks, is beneficial for information security related tasks, where more sophisticated NLP methodologies are not able to pick out relevant information security terms.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在恶意软件威胁报告中改进攻击和攻击策略的自动标记
一旦检测到新的恶意软件,发现它的安全公司就会撰写威胁报告。这些报告通常在描述恶意软件行为的术语上有所不同,这使得比较来自不同公司的同一恶意软件的报告变得困难。为了帮助自动发现新的恶意软件,最近有人提出可以通过识别行为来检测新的恶意软件。这假定在大多数(如果不是全部)恶意软件变体中都存在一组核心行为。然而,缺乏带有行为标签的恶意软件数据集。由于需要将恶意软件标记为一组常见的行为,这项工作检查了将恶意软件标记为恶意软件威胁报告中识别的行为的自动化过程,尽管术语存在可变性。为此,我们研究了自然语言处理(NLP)领域的几种技术。我们发现,大多数最先进的词嵌入NLP方法需要大量的数据,并且是在文本数据的通用语料库上训练的,缺少与信息安全相关的细微差别。为了解决这个问题,我们使用简单的特征选择技术。我们发现,简单的特征选择技术通常优于词嵌入方法,当用于预测威胁报告中的MITRE攻击和ck策略时,f .5得分比先前的工作提高了6%。我们的工作表明,特征选择通常被NLP任务中复杂的方法所忽视,对于信息安全相关的任务是有益的,在这些任务中,更复杂的NLP方法无法挑选出相关的信息安全术语。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Causal Inconsistencies are Normal in Windows Memory Dumps (too) InvesTEE: A TEE-supported Framework for Lawful Remote Forensic Investigations Does Cyber Insurance promote Cyber Security Best Practice? An Analysis based on Insurance Application Forms Unveiling Cyber Threat Actors: A Hybrid Deep Learning Approach for Behavior-based Attribution A Framework for Enhancing Social Media Misinformation Detection with Topical-Tactics
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1