{"title":"Deeply fused flow and topology features for botnet detection based on a pretrained GCN","authors":"Xiaoyuan Meng , Bo Lang , Yuhao Yan , Yanxi Liu","doi":"10.1016/j.comcom.2025.108084","DOIUrl":null,"url":null,"abstract":"<div><div>The characteristics of botnets are mainly reflected in their network behaviors and the intercommunication relationships among their bots. The existing botnet detection methods typically use only one kind of feature, i.e., flow features or topological features; each feature type overlooks the other type of features and affects the resulting model performance. In this paper, for the first time, we propose a botnet detection model that uses a graph convolutional network (GCN) to deeply fuse flow features and topological features. We construct communication graphs from network traffic and represent node attributes with flow features. The extreme sample imbalance phenomenon exhibited by the existing public traffic datasets makes training a GCN model impractical. To address this problem, we propose a pretrained GCN framework that utilizes a public balanced artificial communication graph dataset to pretrain the GCN model, and the feature output obtained from the last hidden layer of the GCN model containing the flow and topology information is input into the Extra Tree classification model. Furthermore, our model can effectively detect command-and-control (C2) and peer-to-peer (P2P) botnets by simply adjusting the number of layers in the GCN. The experimental results obtained on public datasets demonstrate that our approach outperforms the current state-of-the-art botnet detection models. In addition, our model also performs well in real-world botnet detection scenarios.</div></div>","PeriodicalId":55224,"journal":{"name":"Computer Communications","volume":"233 ","pages":"Article 108084"},"PeriodicalIF":4.5000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Communications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0140366425000416","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The characteristics of botnets are mainly reflected in their network behaviors and the intercommunication relationships among their bots. The existing botnet detection methods typically use only one kind of feature, i.e., flow features or topological features; each feature type overlooks the other type of features and affects the resulting model performance. In this paper, for the first time, we propose a botnet detection model that uses a graph convolutional network (GCN) to deeply fuse flow features and topological features. We construct communication graphs from network traffic and represent node attributes with flow features. The extreme sample imbalance phenomenon exhibited by the existing public traffic datasets makes training a GCN model impractical. To address this problem, we propose a pretrained GCN framework that utilizes a public balanced artificial communication graph dataset to pretrain the GCN model, and the feature output obtained from the last hidden layer of the GCN model containing the flow and topology information is input into the Extra Tree classification model. Furthermore, our model can effectively detect command-and-control (C2) and peer-to-peer (P2P) botnets by simply adjusting the number of layers in the GCN. The experimental results obtained on public datasets demonstrate that our approach outperforms the current state-of-the-art botnet detection models. In addition, our model also performs well in real-world botnet detection scenarios.
期刊介绍:
Computer and Communications networks are key infrastructures of the information society with high socio-economic value as they contribute to the correct operations of many critical services (from healthcare to finance and transportation). Internet is the core of today''s computer-communication infrastructures. This has transformed the Internet, from a robust network for data transfer between computers, to a global, content-rich, communication and information system where contents are increasingly generated by the users, and distributed according to human social relations. Next-generation network technologies, architectures and protocols are therefore required to overcome the limitations of the legacy Internet and add new capabilities and services. The future Internet should be ubiquitous, secure, resilient, and closer to human communication paradigms.
Computer Communications is a peer-reviewed international journal that publishes high-quality scientific articles (both theory and practice) and survey papers covering all aspects of future computer communication networks (on all layers, except the physical layer), with a special attention to the evolution of the Internet architecture, protocols, services, and applications.