Bottom Aggregating, Top Separating: An Aggregator and Separator Network for Encrypted Traffic Understanding

IF 8 1区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS IEEE Transactions on Information Forensics and Security Pub Date : 2025-01-13 DOI:10.1109/TIFS.2025.3529316
Wei Peng;Lei Cui;Wei Cai;Wei Wang;Xiaoyu Cui;Zhiyu Hao;Xiaochun Yun
{"title":"Bottom Aggregating, Top Separating: An Aggregator and Separator Network for Encrypted Traffic Understanding","authors":"Wei Peng;Lei Cui;Wei Cai;Wei Wang;Xiaoyu Cui;Zhiyu Hao;Xiaochun Yun","doi":"10.1109/TIFS.2025.3529316","DOIUrl":null,"url":null,"abstract":"Encrypted traffic classification refers to the task of identifying the application, service or malware associated with network traffic that is encrypted. Previous methods mainly have two weaknesses. Firstly, from the perspective of word-level (namely, byte-level) semantics, current methods use pre-training language models like BERT, learned general natural language knowledge, to directly process byte-based traffic data. However, understanding traffic data is different from understanding words in natural language, using BERT directly on traffic data could disrupt internal word sense information so as to affect the performance of classification. Secondly, from the perspective of packet-level semantics, current methods mostly implicitly classify traffic using abstractive semantic features learned at the top layer, without further explicitly separating the features into different space of categories, leading to poor feature discriminability. In this paper, we propose a simple but effective Aggregator and Separator Network (ASNet) for encrypted traffic understanding, which consists of two core modules. Specifically, a parameter-free word sense aggregator enables BERT to rapidly adapt to understanding traffic data and keeping the complete word sense without introducing additional model parameters. And a category-constrained semantics separator with task-aware prompts (as the stimulus) is introduced to explicitly conduct feature learning independently in semantic spaces of different categories. Experiments on five datasets across seven tasks demonstrate that our proposed model achieves the current state-of-the-art results without pre-training in both the public benchmark and real-world collected traffic dataset. Statistical analyses and visualization experiments also validate the interpretability of the core modules. Furthermore, what is important is that ASNet does not need pre-training, which dramatically reduces the cost of computing power and time. The model code and dataset will be released in <uri>https://github.com/pengwei-iie/ASNET</uri>.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"1794-1806"},"PeriodicalIF":8.0000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10839404/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Encrypted traffic classification refers to the task of identifying the application, service or malware associated with network traffic that is encrypted. Previous methods mainly have two weaknesses. Firstly, from the perspective of word-level (namely, byte-level) semantics, current methods use pre-training language models like BERT, learned general natural language knowledge, to directly process byte-based traffic data. However, understanding traffic data is different from understanding words in natural language, using BERT directly on traffic data could disrupt internal word sense information so as to affect the performance of classification. Secondly, from the perspective of packet-level semantics, current methods mostly implicitly classify traffic using abstractive semantic features learned at the top layer, without further explicitly separating the features into different space of categories, leading to poor feature discriminability. In this paper, we propose a simple but effective Aggregator and Separator Network (ASNet) for encrypted traffic understanding, which consists of two core modules. Specifically, a parameter-free word sense aggregator enables BERT to rapidly adapt to understanding traffic data and keeping the complete word sense without introducing additional model parameters. And a category-constrained semantics separator with task-aware prompts (as the stimulus) is introduced to explicitly conduct feature learning independently in semantic spaces of different categories. Experiments on five datasets across seven tasks demonstrate that our proposed model achieves the current state-of-the-art results without pre-training in both the public benchmark and real-world collected traffic dataset. Statistical analyses and visualization experiments also validate the interpretability of the core modules. Furthermore, what is important is that ASNet does not need pre-training, which dramatically reduces the cost of computing power and time. The model code and dataset will be released in https://github.com/pengwei-iie/ASNET.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
底部聚合,顶部分离:用于加密流量理解的聚合和分离网络
加密流分类是指识别与被加密的网络流量相关联的应用程序、服务或恶意软件。以往的方法主要有两个缺点。首先,从词级(即字节级)语义的角度来看,目前的方法使用BERT等预训练语言模型,学习一般的自然语言知识,直接处理基于字节的流量数据。然而,对交通数据的理解不同于对自然语言中的单词的理解,在交通数据上直接使用BERT会破坏内部的词义信息,从而影响分类的性能。其次,从包级语义的角度来看,目前的方法大多是使用在顶层学习到的抽象语义特征来隐式地对流量进行分类,而没有进一步明确地将特征划分到不同的类别空间,导致特征的可判别性较差。本文提出了一种简单而有效的用于加密流量理解的聚合与分离网络(Aggregator and Separator Network, ASNet),它由两个核心模块组成。具体来说,无参数的词义聚合器使BERT能够快速适应理解交通数据并保持完整的词义,而无需引入额外的模型参数。引入以任务感知提示作为刺激的类别约束语义分隔符,明确地在不同类别的语义空间中独立进行特征学习。在七个任务的五个数据集上进行的实验表明,我们提出的模型在没有公共基准和真实世界收集的交通数据集的预训练的情况下达到了当前最先进的结果。统计分析和可视化实验也验证了核心模块的可解释性。此外,重要的是ASNet不需要预训练,这极大地减少了计算能力和时间的成本。模型代码和数据集将在https://github.com/pengwei-iie/ASNET上发布。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Information Forensics and Security
IEEE Transactions on Information Forensics and Security 工程技术-工程:电子与电气
CiteScore
14.40
自引率
7.40%
发文量
234
审稿时长
6.5 months
期刊介绍: The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features
期刊最新文献
Learning Sequential Deception Defense Strategy against APT using Stackelberg Markov Game MPA: Lightweight and Updatable Integrity Auditing for Decentralized Storage using Merkle Trees and Polynomial Commitments TrapFlow: Controllable Website Fingerprinting Defense via Dynamic Backdoor Learning Expressive and Fully Policy-Hidden Attribute-Based Searchable Encryption Scheme for Multi-Owner MT-DEGCL: Multi-Task Encrypted Traffic Classification with Dual Embedding and Graph Contrastive Learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1