Yu Liu;Tong Li;Runzi Zhang;Zhao Jin;Mingkai Tong;Wenmao Liu;Yiting Wang;Zhen Yang
{"title":"A Context-Aware Clustering Approach for Assisting Operators in Classifying Security Alerts","authors":"Yu Liu;Tong Li;Runzi Zhang;Zhao Jin;Mingkai Tong;Wenmao Liu;Yiting Wang;Zhen Yang","doi":"10.1109/TSE.2024.3497588","DOIUrl":null,"url":null,"abstract":"Modern software has evolved from delivering software products to web services and applications, which need to be protected by security operation centers (SOC) against ubiquitous cyber attacks. Numerous security alerts are continuously generated every day, which have to be efficiently and correctly processed to identify potential threats. Many AIOps (artificial intelligence for IT operations) approaches have been proposed to (semi-)automate the inspection of alerts so as to reduce manual effort as much as possible. However, due to the ever-complicating attacks, a significant amount of manual work is still required in practice to ensure correct analysis results. In this paper, we propose a Context-Aware cLustering approach for cLassifying sEcurity alErts (CALLEE), which fully exploits the rich relationships among alerts in order to precisely identify similar alerts, significantly reducing the workload of SOC. Specifically, we first design a core conceptual model to capture connections among security alerts, based on which we establish corresponding heterogeneous information networks. Next, we systematically design a set of meta-paths to profile typical alert scenarios precisely, contributing to obtaining the representation of security alerts. We then cluster security alerts based on their contextual similarities, considering the tradeoff between the number of clusters and the homogeneity of each cluster. Finally, security operators only need to manually inspect a limited number of alerts within each cluster, pragmatically reducing their workload while ensuring the accuracy of alert classification. To evaluate the effectiveness of our approach, we collaborate with our industrial partner and pragmatically apply the approach to a real alert dataset. The results show that our approach can reduce the workload of SOC by 99.76%, outperforming baseline approaches. In addition, we further investigate the integration of our proposal with the real business scenario of our industrial partner. The feedback from practitioners shows that CALLEE is pragmatically applicable and helpful in industrial settings.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 1","pages":"153-171"},"PeriodicalIF":6.5000,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10752431/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Modern software has evolved from delivering software products to web services and applications, which need to be protected by security operation centers (SOC) against ubiquitous cyber attacks. Numerous security alerts are continuously generated every day, which have to be efficiently and correctly processed to identify potential threats. Many AIOps (artificial intelligence for IT operations) approaches have been proposed to (semi-)automate the inspection of alerts so as to reduce manual effort as much as possible. However, due to the ever-complicating attacks, a significant amount of manual work is still required in practice to ensure correct analysis results. In this paper, we propose a Context-Aware cLustering approach for cLassifying sEcurity alErts (CALLEE), which fully exploits the rich relationships among alerts in order to precisely identify similar alerts, significantly reducing the workload of SOC. Specifically, we first design a core conceptual model to capture connections among security alerts, based on which we establish corresponding heterogeneous information networks. Next, we systematically design a set of meta-paths to profile typical alert scenarios precisely, contributing to obtaining the representation of security alerts. We then cluster security alerts based on their contextual similarities, considering the tradeoff between the number of clusters and the homogeneity of each cluster. Finally, security operators only need to manually inspect a limited number of alerts within each cluster, pragmatically reducing their workload while ensuring the accuracy of alert classification. To evaluate the effectiveness of our approach, we collaborate with our industrial partner and pragmatically apply the approach to a real alert dataset. The results show that our approach can reduce the workload of SOC by 99.76%, outperforming baseline approaches. In addition, we further investigate the integration of our proposal with the real business scenario of our industrial partner. The feedback from practitioners shows that CALLEE is pragmatically applicable and helpful in industrial settings.
期刊介绍:
IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include:
a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models.
b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects.
c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards.
d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues.
e) System issues: Hardware-software trade-offs.
f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.