NLabel: An Accurate Familial Clustering Framework for Large-scale Weakly-labeled Malware

Yannan Liu, Yabin Lai, Kaizhi Wei, Liang Gu, Zhengzheng Yan
{"title":"NLabel: An Accurate Familial Clustering Framework for Large-scale Weakly-labeled Malware","authors":"Yannan Liu, Yabin Lai, Kaizhi Wei, Liang Gu, Zhengzheng Yan","doi":"10.1109/TrustCom50675.2020.00039","DOIUrl":null,"url":null,"abstract":"Automatic family labeling for malware is in demand, especially for today's malware scale. While business Anti-Virus engines provide an efficient family labeling method, the raw labels tend to be inconsistent. Prior works mitigate such inconsistency by detecting the aliases and majority voting to obtain the final family label. However, these methods solve the inconsistency in a coarse-grained and vulnerable manner, and the obtained family label is inaccurate sometimes. In this work, we propose NLabel to conduct familial clustering based on AV engines' raw labels. On the one hand, NLabel uses word embedding techniques to capture the similarity among raw labels, transform the inconsistent labels of the same family into similar semantic representations, and mitigate the inconsistency at finer granularity. On the other hand, we propose a hierarchical family clustering method to boost the performance of large-scale data sets. Experimental results show that our method outperforms the SOTA.","PeriodicalId":221956,"journal":{"name":"2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TrustCom50675.2020.00039","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Automatic family labeling for malware is in demand, especially for today's malware scale. While business Anti-Virus engines provide an efficient family labeling method, the raw labels tend to be inconsistent. Prior works mitigate such inconsistency by detecting the aliases and majority voting to obtain the final family label. However, these methods solve the inconsistency in a coarse-grained and vulnerable manner, and the obtained family label is inaccurate sometimes. In this work, we propose NLabel to conduct familial clustering based on AV engines' raw labels. On the one hand, NLabel uses word embedding techniques to capture the similarity among raw labels, transform the inconsistent labels of the same family into similar semantic representations, and mitigate the inconsistency at finer granularity. On the other hand, we propose a hierarchical family clustering method to boost the performance of large-scale data sets. Experimental results show that our method outperforms the SOTA.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
NLabel:大规模弱标记恶意软件的精确家族聚类框架
对恶意软件的自动家族标记是有需求的,特别是对于今天的恶意软件规模。虽然商业反病毒引擎提供了一种有效的家族标签方法,但原始标签往往不一致。先前的工作通过检测别名和多数投票来获得最终的家族标签来缓解这种不一致。然而,这些方法解决不一致性的方法都是粗粒度的、易受攻击的,而且有时得到的家族标签也不准确。在这项工作中,我们提出了NLabel基于AV引擎的原始标签进行家族聚类。一方面,NLabel利用词嵌入技术捕获原始标签之间的相似性,将同族不一致的标签转化为相似的语义表示,并在更细的粒度上缓解不一致。另一方面,我们提出了一种层次族聚类方法来提高大规模数据集的性能。实验结果表明,该方法优于SOTA算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Research on Stitching and Alignment of Mouse Carcass EM Images One Covert Channel to Rule Them All: A Practical Approach to Data Exfiltration in the Cloud MAUSPAD: Mouse-based Authentication Using Segmentation-based, Progress-Adjusted DTW Finding Geometric Medians with Location Privacy Multi-Input Functional Encryption: Efficient Applications from Symmetric Primitives
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1