Towards more realistic evaluations: The impact of label delays in malware detection pipelines

IF 5.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Computers & Security Pub Date : 2025-01-01 Epub Date: 2024-09-19 DOI:10.1016/j.cose.2024.104122

Marcus Botacin , Heitor Gomes

{"title":"Towards more realistic evaluations: The impact of label delays in malware detection pipelines","authors":"Marcus Botacin , Heitor Gomes","doi":"10.1016/j.cose.2024.104122","DOIUrl":null,"url":null,"abstract":"<div><div>Developing and evaluating malware classification pipelines to reflect real-world needs is as vital to protect users as it is hard to achieve. In many cases, the experimental conditions when the approach was developed and the deployment settings mismatch, which causes the solutions not to achieve the desired results. In this work, we explore how unrealistic project and evaluation decisions in the literature are. In particular, we shed light on the problem of label delays, i.e., the assumption that ground-truth labels for classifier retraining are always available when in the real world they take significant time to be produced, which also causes a significant attack opportunity window. In our analyses, among diverse aspects, we address: (1) The use of metrics that do not account for the effect of time; (2) The occurrence of concept drift and ideal assumptions about the amount of drift data a system can handle; and (3) Ideal assumptions about the availability of oracle data for drift detection and the need for relying on pseudo-labels for mitigating drift-related delays. We present experiments based on a newly proposed exposure metric to show that delayed labels due to limited analysis queue sizes impose a significant challenge for detection (e.g., up to a 75% greater attack opportunity in the real world than in the experimental setting) and that pseudo-labels are useful in mitigating the delays (reducing the detection loss to only 30% of the original value).</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"148 ","pages":"Article 104122"},"PeriodicalIF":5.4000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404824004279","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/19 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Developing and evaluating malware classification pipelines to reflect real-world needs is as vital to protect users as it is hard to achieve. In many cases, the experimental conditions when the approach was developed and the deployment settings mismatch, which causes the solutions not to achieve the desired results. In this work, we explore how unrealistic project and evaluation decisions in the literature are. In particular, we shed light on the problem of label delays, i.e., the assumption that ground-truth labels for classifier retraining are always available when in the real world they take significant time to be produced, which also causes a significant attack opportunity window. In our analyses, among diverse aspects, we address: (1) The use of metrics that do not account for the effect of time; (2) The occurrence of concept drift and ideal assumptions about the amount of drift data a system can handle; and (3) Ideal assumptions about the availability of oracle data for drift detection and the need for relying on pseudo-labels for mitigating drift-related delays. We present experiments based on a newly proposed exposure metric to show that delayed labels due to limited analysis queue sizes impose a significant challenge for detection (e.g., up to a 75% greater attack opportunity in the real world than in the experimental setting) and that pseudo-labels are useful in mitigating the delays (reducing the detection loss to only 30% of the original value).

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

实现更真实的评估：恶意软件检测管道中标签延迟的影响

开发和评估恶意软件分类管道以反映真实世界的需求，对于保护用户至关重要，但却很难实现。在许多情况下，方法开发时的实验条件与部署设置不匹配，导致解决方案无法达到预期效果。在这项工作中，我们探讨了文献中的项目和评估决策是如何不切实际。特别是，我们揭示了标签延迟的问题，即假设用于分类器再训练的地面实况标签总是可用的，而在现实世界中，标签的生成需要大量时间，这也造成了大量的攻击机会窗口。在我们的分析中，我们涉及了多个方面：(1) 使用不考虑时间影响的度量标准；(2) 概念漂移的发生以及关于系统可处理漂移数据量的理想假设；(3) 关于漂移检测甲骨文数据可用性的理想假设，以及依赖伪标签来减轻漂移相关延迟的必要性。我们基于新提出的暴露度量标准进行了实验，结果表明，由于分析队列规模有限而导致的标签延迟给检测带来了巨大挑战（例如，现实世界中的攻击机会比实验环境中的攻击机会最多高出 75%），而伪标签在缓解延迟方面非常有用（将检测损失降至原始值的 30%）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computers & Security 工程技术-计算机：信息系统

CiteScore

12.40

自引率

7.10%

发文量

365

审稿时长

10.7 months

期刊介绍： Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world. Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.