Pub Date : 2024-11-05DOI: 10.1016/j.cose.2024.104192
Xiaojian Liu , Xinwei Guo , Wen Gu
Knowledge graph technology is widely used in network security design, analysis, and detection. By collecting, organizing, and mining various security knowledge, it provides scientific support for security decisions. Some public Security Knowledge Repositories (SKRs) are frequently used to construct security knowledge graphs. The quality of SKRs affects the efficiency and effectiveness of security analysis. However, the current situation is that the identification of relational information among security knowledge elements is not sufficient and timely, and a large number of key relational information is missing. In view of this, we propose a security knowledge graph relational reasoning method, based on the fusion embedding of semantic correlation and structure correlation, named SecKG2vec. By SecKG2vec, the embedded vector simultaneously presents both semantic and structural characteristics, and it can exhibit better relational reasoning performance. In qualitative evaluation and quantitative experiments with baseline methods, SecKG2vec has better performance in relationship reasoning task and entity reasoning task, and potential capability of 0-shot scenario prediction.
{"title":"SecKG2vec: A novel security knowledge graph relational reasoning method based on semantic and structural fusion embedding","authors":"Xiaojian Liu , Xinwei Guo , Wen Gu","doi":"10.1016/j.cose.2024.104192","DOIUrl":"10.1016/j.cose.2024.104192","url":null,"abstract":"<div><div>Knowledge graph technology is widely used in network security design, analysis, and detection. By collecting, organizing, and mining various security knowledge, it provides scientific support for security decisions. Some public Security Knowledge Repositories (SKRs) are frequently used to construct security knowledge graphs. The quality of SKRs affects the efficiency and effectiveness of security analysis. However, the current situation is that the identification of relational information among security knowledge elements is not sufficient and timely, and a large number of key relational information is missing. In view of this, we propose a security knowledge graph relational reasoning method, based on the fusion embedding of semantic correlation and structure correlation, named <em>SecKG2vec</em>. By <em>SecKG2vec</em>, the embedded vector simultaneously presents both semantic and structural characteristics, and it can exhibit better relational reasoning performance. In qualitative evaluation and quantitative experiments with baseline methods, <em>SecKG2vec</em> has better performance in relationship reasoning task and entity reasoning task, and potential capability of 0-shot scenario prediction.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"149 ","pages":"Article 104192"},"PeriodicalIF":4.8,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142654821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-05DOI: 10.1016/j.cose.2024.104178
César Gil, Javier Parra-Arnau, Jordi Forné
Personalized information systems are information-filtering systems that endeavor to tailor information-exchange functionality to the specific interests of their users. The ability of these systems to profile users based on their search queries at Google, disclosed locations at Twitter or rated movies at Netflix, is on the one hand what enables such intelligent functionality, but on the other, the source of serious privacy concerns. Leveraging on the principle of data minimization, we propose a data-generalization mechanism that aims to protect users’ privacy against non-fully trusted personalized information systems. In our approach, a user may like to disclose personal data to such systems when they feel comfortable. But when they do not, they may wish to replace specific and sensitive data with more general and thus less sensitive data, before sharing this information with the personalized system in question. Generalization therefore may protect user privacy to a certain extent, but clearly at the cost of some information loss. In this work, we model mathematically an optimized version of this mechanism and investigate theoretically some key properties of the privacy-utility trade-off posed by this mechanism. Experimental results on two real-world datasets demonstrate how our approach may contribute to privacy protection and show it can outperform state-of-the-art perturbation techniques like data forgery and suppression by providing higher utility for a same privacy level. On a practical level, the implications of our work are diverse in the field of personalized online services. We emphasize that our mechanism allows each user individually to take charge of their own privacy, without the need to go to third parties or share resources with other users. And on the other hand, it provides privacy designers/engineers with a new data-perturbative mechanism with which to evaluate their systems in the presence of data that is likely to be generalizable according to a certain hierarchy, highlighting spatial generalization, with practical application in popular location based services. Overall, a data-perturbation mechanism for privacy protection against user profiling, which is optimal, deterministic, and local, based on a untrusted model towards third parties.
{"title":"Privacy protection against user profiling through optimal data generalization","authors":"César Gil, Javier Parra-Arnau, Jordi Forné","doi":"10.1016/j.cose.2024.104178","DOIUrl":"10.1016/j.cose.2024.104178","url":null,"abstract":"<div><div>Personalized information systems are information-filtering systems that endeavor to tailor information-exchange functionality to the specific interests of their users. The ability of these systems to profile users based on their search queries at Google, disclosed locations at Twitter or rated movies at Netflix, is on the one hand what enables such intelligent functionality, but on the other, the source of serious privacy concerns. Leveraging on the principle of data minimization, we propose a data-generalization mechanism that aims to protect users’ privacy against non-fully trusted personalized information systems. In our approach, a user may like to disclose personal data to such systems when they feel comfortable. But when they do not, they may wish to replace specific and sensitive data with more general and thus less sensitive data, before sharing this information with the personalized system in question. Generalization therefore may protect user privacy to a certain extent, but clearly at the cost of some information loss. In this work, we model mathematically an optimized version of this mechanism and investigate theoretically some key properties of the privacy-utility trade-off posed by this mechanism. Experimental results on two real-world datasets demonstrate how our approach may contribute to privacy protection and show it can outperform state-of-the-art perturbation techniques like data forgery and suppression by providing higher utility for a same privacy level. On a practical level, the implications of our work are diverse in the field of personalized online services. We emphasize that our mechanism allows each user individually to take charge of their own privacy, without the need to go to third parties or share resources with other users. And on the other hand, it provides privacy designers/engineers with a new data-perturbative mechanism with which to evaluate their systems in the presence of data that is likely to be generalizable according to a certain hierarchy, highlighting spatial generalization, with practical application in popular location based services. Overall, a data-perturbation mechanism for privacy protection against user profiling, which is optimal, deterministic, and local, based on a untrusted model towards third parties.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"148 ","pages":"Article 104178"},"PeriodicalIF":4.8,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142661787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-04DOI: 10.1016/j.cose.2024.104182
Mauro Allegretta , Giuseppe Siracusano , Roberto González , Marco Gramaglia , Juan Caballero
Internet Web and cloud services are routinely abused by malware, but the breadth of this abuse has not been thoroughly investigated. In this work, we quantitatively investigate this abuse by leveraging data from the Cyber Threat Alliance (CTA), where 36 security vendors share threat intelligence. We analyze CTA data collected over 4 years from January 2020 until December 2023 comprising over one billion cyber-security observations from where we extract 7.7M URLs and 1.8M domains related to malware. We complement this dataset with an active measurement where we periodically attempt to download the content pointed out by 33,876 recently reported malicious URLs. We investigate the following questions. How generalized is malware abuse of Internet services? How do domains of abused Internet services differ? For what purpose are Internet services abused? and How long do malicious resources remain active? Among others, we uncover a broad abuse affecting 22K domains of Internet services, that Internet services are largely abused for enabling malware distribution, and that malicious content in Internet services remains active longer than on malicious domains.
{"title":"Web of shadows: Investigating malware abuse of internet services","authors":"Mauro Allegretta , Giuseppe Siracusano , Roberto González , Marco Gramaglia , Juan Caballero","doi":"10.1016/j.cose.2024.104182","DOIUrl":"10.1016/j.cose.2024.104182","url":null,"abstract":"<div><div>Internet Web and cloud services are routinely abused by malware, but the breadth of this abuse has not been thoroughly investigated. In this work, we quantitatively investigate this abuse by leveraging data from the Cyber Threat Alliance (CTA), where 36 security vendors share threat intelligence. We analyze CTA data collected over 4 years from January 2020 until December 2023 comprising over one billion cyber-security observations from where we extract 7.7M URLs and 1.8M domains related to malware. We complement this dataset with an active measurement where we periodically attempt to download the content pointed out by 33,876 recently reported malicious URLs. We investigate the following questions. How generalized is malware abuse of Internet services? How do domains of abused Internet services differ? For what purpose are Internet services abused? and How long do malicious resources remain active? Among others, we uncover a broad abuse affecting 22K domains of Internet services, that Internet services are largely abused for enabling malware distribution, and that malicious content in Internet services remains active longer than on malicious domains.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"149 ","pages":"Article 104182"},"PeriodicalIF":4.8,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142654729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-03DOI: 10.1016/j.cose.2024.104185
Abdullah Al Mamun , Harith Al-Sahaf , Ian Welch , Seyit Camtepe
Advanced Persistent Threats (APTs) pose considerable challenges in the realm of cybersecurity, characterized by their evolving tactics and complex evasion techniques. These characteristics often outsmart traditional security measures and necessitate the development of more sophisticated detection methods. This study introduces Feature Evolution using Genetic Programming (FEGP), a novel method that leverages multi-tree Genetic Programming (GP) to construct and enhance features for APT detection. While GP has been widely utilized for tackling various problems in different domains, our study focuses on the adaptation of GP to the multifaceted landscape of APT detection. The proposed method automatically constructs discriminative features by combining the original features using mathematical operators. By leveraging GP, the system adapts to the evolving tactics employed by APTs, enhancing the identification of APT activities with greater accuracy and reliability. To assess the efficacy of the proposed method, comprehensive experiments were conducted on widely used and publicly accessible APT datasets. Using the combination of constructed and original features on the DAPT-2020 dataset, FEGP achieved a balanced accuracy of 79.28%, surpassing the best comparative methods by an average of 2.12% in detecting APT stages. Additionally, utilizing only constructed features on the Unraveled dataset, FEGP achieved a balanced accuracy of 83.14%, demonstrating a 3.73% improvement over the best comparative method. The findings presented in this paper underscore the importance of GP-based feature construction for APT detection, providing a pathway toward improved accuracy and efficiency in identifying APT activities. The comparative analysis of the proposed method against existing feature construction methods demonstrates FEGP’s effectiveness as a state-of-the-art method for multi-class APT classification. In addition to the performance evaluation, further analysis was conducted, encompassing feature importance analysis, and a detailed time analysis.
{"title":"Genetic programming for enhanced detection of Advanced Persistent Threats through feature construction","authors":"Abdullah Al Mamun , Harith Al-Sahaf , Ian Welch , Seyit Camtepe","doi":"10.1016/j.cose.2024.104185","DOIUrl":"10.1016/j.cose.2024.104185","url":null,"abstract":"<div><div>Advanced Persistent Threats (APTs) pose considerable challenges in the realm of cybersecurity, characterized by their evolving tactics and complex evasion techniques. These characteristics often outsmart traditional security measures and necessitate the development of more sophisticated detection methods. This study introduces Feature Evolution using Genetic Programming (FEGP), a novel method that leverages multi-tree Genetic Programming (GP) to construct and enhance features for APT detection. While GP has been widely utilized for tackling various problems in different domains, our study focuses on the adaptation of GP to the multifaceted landscape of APT detection. The proposed method automatically constructs discriminative features by combining the original features using mathematical operators. By leveraging GP, the system adapts to the evolving tactics employed by APTs, enhancing the identification of APT activities with greater accuracy and reliability. To assess the efficacy of the proposed method, comprehensive experiments were conducted on widely used and publicly accessible APT datasets. Using the combination of constructed and original features on the DAPT-2020 dataset, FEGP achieved a balanced accuracy of 79.28%, surpassing the best comparative methods by an average of 2.12% in detecting APT stages. Additionally, utilizing only constructed features on the Unraveled dataset, FEGP achieved a balanced accuracy of 83.14%, demonstrating a 3.73% improvement over the best comparative method. The findings presented in this paper underscore the importance of GP-based feature construction for APT detection, providing a pathway toward improved accuracy and efficiency in identifying APT activities. The comparative analysis of the proposed method against existing feature construction methods demonstrates FEGP’s effectiveness as a state-of-the-art method for multi-class APT classification. In addition to the performance evaluation, further analysis was conducted, encompassing feature importance analysis, and a detailed time analysis.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"149 ","pages":"Article 104185"},"PeriodicalIF":4.8,"publicationDate":"2024-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142654819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-31DOI: 10.1016/j.cose.2024.104188
P. Vidyasri, S. Suresh
Phishing attacks have emerged as a major social engineering threat that affects businesses, governments, and general internet users. This work proposes a social engineering phishing detection technique based on Deep Learning (DL). Initially, website data is taken from the dataset. Then, the features of Natural Language Processing (NLP) like bag of words, n-gram, hashtags, sentence length, Term Frequency- Inverse Document Frequency of records (TF-IDF), and all caps are extracted and then web feature extraction is carried out. Later, the feature fusion is done using the Neyman similarity with Deep Belief Network (DBN). Afterwards, oversampling is used for data augmentation to enhance the number of training samples. Lastly, the detection of phishing attacks is performed by employing the proposed Fuzzy Deep Neural-Stacked Autoencoder (FDN-SA). Here, the proposed FDN-SA is developed by combining a Deep Neural Network (DNN), and Deep Stacked Autoencoder (DSA). Further, the investigation of FDN-SA is accomplished based on the accuracy, True Positive Rate (TPR), and True Negative Rate (TNR) and is observed to compute values of 0.920, 0.925, and 0.921, respectively.
{"title":"FDN-SA: Fuzzy deep neural-stacked autoencoder-based phishing attack detection in social engineering","authors":"P. Vidyasri, S. Suresh","doi":"10.1016/j.cose.2024.104188","DOIUrl":"10.1016/j.cose.2024.104188","url":null,"abstract":"<div><div>Phishing attacks have emerged as a major social engineering threat that affects businesses, governments, and general internet users. This work proposes a social engineering phishing detection technique based on Deep Learning (DL). Initially, website data is taken from the dataset. Then, the features of Natural Language Processing (NLP) like bag of words, n-gram, hashtags, sentence length, Term Frequency- Inverse Document Frequency of records (TF-IDF), and all caps are extracted and then web feature extraction is carried out. Later, the feature fusion is done using the Neyman similarity with Deep Belief Network (DBN). Afterwards, oversampling is used for data augmentation to enhance the number of training samples. Lastly, the detection of phishing attacks is performed by employing the proposed Fuzzy Deep Neural-Stacked Autoencoder (FDN-SA). Here, the proposed FDN-SA is developed by combining a Deep Neural Network (DNN), and Deep Stacked Autoencoder (DSA). Further, the investigation of FDN-SA is accomplished based on the accuracy, True Positive Rate (TPR), and True Negative Rate (TNR) and is observed to compute values of 0.920, 0.925, and 0.921, respectively.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"148 ","pages":"Article 104188"},"PeriodicalIF":4.8,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142661786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-30DOI: 10.1016/j.cose.2024.104164
Raviha Khan , Hossien B. Eldeeb , Brahim Mefgouda , Omar Alhussein , Hani Saleh , Sami Muhaidat
Internet of Things (IoT) networks have been deployed widely making device authentication a crucial requirement that poses challenges related to security vulnerabilities, power consumption, and maintenance overheads. While current cryptographic techniques secure device communication; storing keys in Non-Volatile Memory (NVM) poses challenges for edge devices. Physically Unclonable Functions (PUFs) offer robust hardware-based authentication but introduce complexities such as hardware production and conservation expenses and susceptibility to aging effects. This paper’s main contribution is a novel scheme based on split learning, utilizing an encoder–decoder architecture at the device and server nodes, to first create a Virtual PUF (VPUF) that addresses the shortcomings of the hardware PUF and secondly perform device authentication. The proposed VPUF reduces maintenance and power demands compared to the hardware PUF while enhancing security by transmitting latent space representations of responses between the node and the server. Also, since the encoder is placed on the node, while the decoder is on the server, this approach further reduces the computational load and processing time on the resource-constrained node. The obtained results demonstrate the effectiveness of the proposed VPUF scheme in modeling the behavior of the hardware-based PUF. Additionally, we investigate the impact of Gaussian noise in the communication channel between the server and the node on the system performance. The obtained results further reveal that the achieved authentication accuracy of the proposed scheme is 100%, as measured by the validation rate of the legitimate nodes. This highlights the superior performance of the proposed scheme in emulating the capabilities of a hardware-based PUF while providing secure and efficient authentication in IoT networks.
{"title":"Encoder decoder-based Virtual Physically Unclonable Function for Internet of Things device authentication using split-learning","authors":"Raviha Khan , Hossien B. Eldeeb , Brahim Mefgouda , Omar Alhussein , Hani Saleh , Sami Muhaidat","doi":"10.1016/j.cose.2024.104164","DOIUrl":"10.1016/j.cose.2024.104164","url":null,"abstract":"<div><div>Internet of Things (IoT) networks have been deployed widely making device authentication a crucial requirement that poses challenges related to security vulnerabilities, power consumption, and maintenance overheads. While current cryptographic techniques secure device communication; storing keys in Non-Volatile Memory (NVM) poses challenges for edge devices. Physically Unclonable Functions (PUFs) offer robust hardware-based authentication but introduce complexities such as hardware production and conservation expenses and susceptibility to aging effects. This paper’s main contribution is a novel scheme based on split learning, utilizing an encoder–decoder architecture at the device and server nodes, to first create a Virtual PUF (VPUF) that addresses the shortcomings of the hardware PUF and secondly perform device authentication. The proposed VPUF reduces maintenance and power demands compared to the hardware PUF while enhancing security by transmitting latent space representations of responses between the node and the server. Also, since the encoder is placed on the node, while the decoder is on the server, this approach further reduces the computational load and processing time on the resource-constrained node. The obtained results demonstrate the effectiveness of the proposed VPUF scheme in modeling the behavior of the hardware-based PUF. Additionally, we investigate the impact of Gaussian noise in the communication channel between the server and the node on the system performance. The obtained results further reveal that the achieved authentication accuracy of the proposed scheme is 100%, as measured by the validation rate of the legitimate nodes. This highlights the superior performance of the proposed scheme in emulating the capabilities of a hardware-based PUF while providing secure and efficient authentication in IoT networks.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"148 ","pages":"Article 104164"},"PeriodicalIF":4.8,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142661694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-30DOI: 10.1016/j.cose.2024.104180
Fengrui Xiao , Shuangwu Chen , Jian Yang , Huasen He , Xiaofeng Jiang , Xiaobin Tan , Dong Jin
Correlating individual alerts to reconstruct attack scenarios has become a critical issue in identifying multi-step attack paths. Most of existing reconstruction approaches depend on external expertise, such as attack templates or attack graphs, to identify known attack patterns, which are incapable of uncovering unknown attack patterns that exceed prior knowledge. Recently, several expertise-independent methods utilize alert similarity or statistical correlations to reconstruct multi-step attacks. However, these methods often miss rare but high-risk events. The key to overcoming these drawbacks lies in discovering the potential causalities between security alerts. In this paper, we propose GRAIN, a novel graph neural network and reinforcement learning aided causality discovery approach for multi-step attack scenario reconstruction, which does not rely on any external expertise or prior knowledge. By matching the similarity between alerts’ attack semantics, we first remove redundant alerts to alleviate alert fatigue. Then, we correlate these alerts as alert causal graphs that embody the causalities between attack incidents via causality discovery. Afterwards, we employ a graph neural network to evaluate the causal effect between correlated alerts. In light of the fact that the alerts triggered by multi-step attacks have the maximum causal effect, we utilize reinforcement learning to screen out authentic causal relationships. Extensive evaluations on 4 public multi-step attack datasets demonstrate that GRAIN significantly outperforms existing methods in terms of accuracy and efficiency, providing a robust solution for identifying and analyzing sophisticated multi-step attacks.
{"title":"GRAIN: Graph neural network and reinforcement learning aided causality discovery for multi-step attack scenario reconstruction","authors":"Fengrui Xiao , Shuangwu Chen , Jian Yang , Huasen He , Xiaofeng Jiang , Xiaobin Tan , Dong Jin","doi":"10.1016/j.cose.2024.104180","DOIUrl":"10.1016/j.cose.2024.104180","url":null,"abstract":"<div><div>Correlating individual alerts to reconstruct attack scenarios has become a critical issue in identifying multi-step attack paths. Most of existing reconstruction approaches depend on external expertise, such as attack templates or attack graphs, to identify known attack patterns, which are incapable of uncovering unknown attack patterns that exceed prior knowledge. Recently, several expertise-independent methods utilize alert similarity or statistical correlations to reconstruct multi-step attacks. However, these methods often miss rare but high-risk events. The key to overcoming these drawbacks lies in discovering the potential causalities between security alerts. In this paper, we propose GRAIN, a novel graph neural network and reinforcement learning aided causality discovery approach for multi-step attack scenario reconstruction, which does not rely on any external expertise or prior knowledge. By matching the similarity between alerts’ attack semantics, we first remove redundant alerts to alleviate alert fatigue. Then, we correlate these alerts as alert causal graphs that embody the causalities between attack incidents via causality discovery. Afterwards, we employ a graph neural network to evaluate the causal effect between correlated alerts. In light of the fact that the alerts triggered by multi-step attacks have the maximum causal effect, we utilize reinforcement learning to screen out authentic causal relationships. Extensive evaluations on 4 public multi-step attack datasets demonstrate that GRAIN significantly outperforms existing methods in terms of accuracy and efficiency, providing a robust solution for identifying and analyzing sophisticated multi-step attacks.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"148 ","pages":"Article 104180"},"PeriodicalIF":4.8,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142578162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-29DOI: 10.1016/j.cose.2024.104177
Peng Wu , Mohan Gao , Fuhui Sun , Xiaoyan Wang , Li Pan
The growing variety of malicious software, i.e., malware, has caused great damage and economic loss to computer systems. The API call sequence of malware reflects its dynamic behavior during execution, which is difficult to disguise. Therefore, API call sequence can serve as a robust feature for the detection and classification of malware. The statistical analysis presented in this paper reveals two distinct characteristics within the API call sequences of different malware: (1) the API existence feature caused by frequent calls to the APIs with some special functions, and (2) the API transition feature caused by frequent calls to some special API subsequence patterns. Based on these two characteristics, this paper proposes MINES, a Multi-perspective apI call sequeNce bEhavior fuSion malware classification Method. Specifically, the API existence features from different perspectives are described by two graphs that model diverse rich and complex existence relationships between APIs, and we adopt the graph contrastive learning framework to extract the consistent shared API existence feature from two graphs. Similarly, the API transition features of different hops are described by the multi-order transition probability matrices. By treat each order as a channel, a CNN-based contrastive learning framework is adopted to extract the API transition feature. Finally, the two kinds of extracted features are fused to classify malware. Experiments on five datasets demonstrate the superiority of MINES over various state-of-the-arts by a large margin.
恶意软件(即恶意软件)的种类越来越多,给计算机系统造成了巨大的破坏和经济损失。恶意软件的 API 调用序列反映了其在执行过程中的动态行为,很难伪装。因此,API 调用序列可以作为检测和分类恶意软件的有力特征。本文的统计分析揭示了不同恶意软件的API调用序列的两个明显特征:(1)频繁调用具有某些特殊功能的API所导致的API存在特征;(2)频繁调用某些特殊API子序列模式所导致的API转换特征。基于这两个特征,本文提出了多角度API调用捕获行为分析恶意软件分类方法(MINES)。具体来说,不同视角的API存在特征由两个图来描述,这两个图模拟了API之间多样丰富复杂的存在关系,我们采用图对比学习框架从两个图中提取一致共享的API存在特征。同样,不同跳数的 API 转换特征也由多阶转换概率矩阵来描述。通过将每个阶作为一个通道,采用基于 CNN 的对比学习框架来提取 API 过渡特征。最后,融合两种提取的特征对恶意软件进行分类。在五个数据集上进行的实验表明,MINES 比各种先进技术都要优越得多。
{"title":"Multi-perspective API call sequence behavior analysis and fusion for malware classification","authors":"Peng Wu , Mohan Gao , Fuhui Sun , Xiaoyan Wang , Li Pan","doi":"10.1016/j.cose.2024.104177","DOIUrl":"10.1016/j.cose.2024.104177","url":null,"abstract":"<div><div>The growing variety of malicious software, i.e., malware, has caused great damage and economic loss to computer systems. The API call sequence of malware reflects its dynamic behavior during execution, which is difficult to disguise. Therefore, API call sequence can serve as a robust feature for the detection and classification of malware. The statistical analysis presented in this paper reveals two distinct characteristics within the API call sequences of different malware: (1) the API existence feature caused by frequent calls to the APIs with some special functions, and (2) the API transition feature caused by frequent calls to some special API subsequence patterns. Based on these two characteristics, this paper proposes MINES, a Multi-perspective apI call sequeNce bEhavior fuSion malware classification Method. Specifically, the API existence features from different perspectives are described by two graphs that model diverse rich and complex existence relationships between APIs, and we adopt the graph contrastive learning framework to extract the consistent shared API existence feature from two graphs. Similarly, the API transition features of different hops are described by the multi-order transition probability matrices. By treat each order as a channel, a CNN-based contrastive learning framework is adopted to extract the API transition feature. Finally, the two kinds of extracted features are fused to classify malware. Experiments on five datasets demonstrate the superiority of MINES over various state-of-the-arts by a large margin.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"148 ","pages":"Article 104177"},"PeriodicalIF":4.8,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-29DOI: 10.1016/j.cose.2024.104181
Seunghoon Woo, Eunjin Choi, Heejo Lee
Public vulnerability reports assist developers in mitigating recurring threats caused by software vulnerabilities. However, security patches that lack effectiveness (1) may fail to completely resolve target vulnerabilities after application (i.e., require supplementary patches), or (2) cannot be directly applied to the codebase without modifying the patch code snippets. In this study, we systematically assessed the effectiveness of security patches from the perspective of their reliability and flexibility. We define a security patch as reliable or flexible, respectively, if it can resolve the vulnerability (1) without being complemented by additional patches or (2) without modifying the patch code snippets. Unlike previous studies that relied on manual inspection, we assess the reliability of a security patch by determining the presence of supplementary patches that complement the security patch. To evaluate flexibility, we first locate vulnerable codes in popular open-source software programs and then determine whether the security patch can be applied without any modifications. Our experiments on 8,100 security patches obtained from the National Vulnerability Database confirmed that one in ten of the collected patches lacked effectiveness. We discovered 476 (5.9%) unreliable patches that could still produce security issues after application; for 84.6% of the detected unreliable patches, the fact that a supplementary patch is required is not disclosed through public security reports. Furthermore, 377 (4.6%) security patches were observed to lack flexibility; we confirmed that 49.1% of the detected vulnerable codes required patch modifications owing to syntax diversity. Our findings revealed that the effectiveness of security patches can directly affect software security, suggesting the need to enhance the vulnerability reporting process.
{"title":"A large-scale analysis of the effectiveness of publicly reported security patches","authors":"Seunghoon Woo, Eunjin Choi, Heejo Lee","doi":"10.1016/j.cose.2024.104181","DOIUrl":"10.1016/j.cose.2024.104181","url":null,"abstract":"<div><div>Public vulnerability reports assist developers in mitigating recurring threats caused by software vulnerabilities. However, security patches that lack effectiveness (1) may fail to completely resolve target vulnerabilities after application (<em>i.e.</em>, require supplementary patches), or (2) cannot be directly applied to the codebase without modifying the patch code snippets. In this study, we systematically assessed the effectiveness of security patches from the perspective of their reliability and flexibility. We define a security patch as reliable or flexible, respectively, if it can resolve the vulnerability (1) without being complemented by additional patches or (2) without modifying the patch code snippets. Unlike previous studies that relied on manual inspection, we assess the reliability of a security patch by determining the presence of supplementary patches that complement the security patch. To evaluate flexibility, we first locate vulnerable codes in popular open-source software programs and then determine whether the security patch can be applied without any modifications. Our experiments on 8,100 security patches obtained from the National Vulnerability Database confirmed that one in ten of the collected patches lacked effectiveness. We discovered 476 (5.9%) unreliable patches that could still produce security issues after application; for 84.6% of the detected unreliable patches, the fact that a supplementary patch is required is not disclosed through public security reports. Furthermore, 377 (4.6%) security patches were observed to lack flexibility; we confirmed that 49.1% of the detected vulnerable codes required patch modifications owing to syntax diversity. Our findings revealed that the effectiveness of security patches can directly affect software security, suggesting the need to enhance the vulnerability reporting process.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"148 ","pages":"Article 104181"},"PeriodicalIF":4.8,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142661693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-29DOI: 10.1016/j.cose.2024.104175
Hongyu Lu, Jiajia Liu, Jimin Peng, Jiazhong Lu
To enhance the robustness of intrusion detection classifiers, we propose a Time Series-based Adversarial Attack Framework (TSAF) targeting the temporal characteristics of network traffic. Initially, adversarial samples are generated using the gradient calculations of CNNs, with updates iterated based on model loss. Different attack schemes are then applied to various traffic types and saved as generic adversarial perturbations. These time series-based perturbations are subsequently injected into the traffic stream. To precisely implement the adversarial perturbations, a masking mechanism is utilized. Our adversarial sample model was evaluated, and the results indicate that our samples can reduce the accuracy and recall rates for detecting four types of malicious network traffic, including botnets, brute force, port scanning, and web attacks, as well as degrade the detection performance of DDoS traffic. The CNN model’s accuracy dropped by up to 72.76%, and the SDAE model’s accuracy by up to 78.77% with minimal perturbations. Our adversarial sample attack offers a new perspective in the field of cybersecurity and lays the groundwork for designing AI models that can resist adversarial attacks more effectively.
{"title":"Adversarial attacks based on time-series features for traffic detection","authors":"Hongyu Lu, Jiajia Liu, Jimin Peng, Jiazhong Lu","doi":"10.1016/j.cose.2024.104175","DOIUrl":"10.1016/j.cose.2024.104175","url":null,"abstract":"<div><div>To enhance the robustness of intrusion detection classifiers, we propose a Time Series-based Adversarial Attack Framework (TSAF) targeting the temporal characteristics of network traffic. Initially, adversarial samples are generated using the gradient calculations of CNNs, with updates iterated based on model loss. Different attack schemes are then applied to various traffic types and saved as generic adversarial perturbations. These time series-based perturbations are subsequently injected into the traffic stream. To precisely implement the adversarial perturbations, a masking mechanism is utilized. Our adversarial sample model was evaluated, and the results indicate that our samples can reduce the accuracy and recall rates for detecting four types of malicious network traffic, including botnets, brute force, port scanning, and web attacks, as well as degrade the detection performance of DDoS traffic. The CNN model’s accuracy dropped by up to 72.76%, and the SDAE model’s accuracy by up to 78.77% with minimal perturbations. Our adversarial sample attack offers a new perspective in the field of cybersecurity and lays the groundwork for designing AI models that can resist adversarial attacks more effectively.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"148 ","pages":"Article 104175"},"PeriodicalIF":4.8,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142578161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}