利用 BERT 和数据包标题对加密流量进行应用分类的新方法

IF 4.4 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Computer Networks Pub Date : 2024-09-01 DOI:10.1016/j.comnet.2024.110747
{"title":"利用 BERT 和数据包标题对加密流量进行应用分类的新方法","authors":"","doi":"10.1016/j.comnet.2024.110747","DOIUrl":null,"url":null,"abstract":"<div><p>Recent years have seen substantial advancements in Internet technology along with environmental changes, which have led to the emergence of various security issues. There is also a trend of explosive growth in applications that encrypt network traffic for various types of services. Therefore, the classification of applications within encrypted traffic represents an important research issue for both secure network management and efficient bandwidth management. In such encrypted traffic, the payload itself is encrypted, and it is no longer viable to classify applications based on signatures extracted from plaintext. Most applications in public datasets for encrypted traffic classification are collected with the same IP address and port number, which makes the 5-tuple information a strong identifier. However, this 5-tuple contains many characteristics related to both the traffic collection environment and user-specific traits, rather than intrinsic features of the applications themselves. Therefore, when addressing the problem of encrypted traffic application classification, it is advisable to utilize header information excluding the 5-tuple and payload. Therefore, this paper proposes a novel service type and application classification system based on the Bidirectional Encoding Representation Transformer (BERT), which utilizes packet header information from encrypted traffic. The proposed system ensures the accuracy and generalization performance of the classification model by using only the header information from traffic packets, excluding the 5-tuple and payload. Further, to preserve the characteristics and semantic meaning of an encrypted traffic packet, sentences embedded with 2-byte tokens were used as input for BERT. The proposed system was designed to exclude labeling information from all sentences during the pre-training phase before proceeding with training. Fine-tuning was then conducted to align the system with the objectives of the service type and application classification. This experiment utilized the publicly available ISCX VPN-nonVPN dataset, and the proposed model achieved remarkable accuracy in the key performance measure, i.e., F1-scores, with values of 99.24 % in service type classification and 98.74 % in application classification. This capability can be used in maintaining the confidentiality of encrypted traffic, network security monitoring, Quality of Service (QoS), and traffic management in complex IT environments.</p></div>","PeriodicalId":50637,"journal":{"name":"Computer Networks","volume":null,"pages":null},"PeriodicalIF":4.4000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1389128624005796/pdfft?md5=cb87abf0c95322289a1157af054f1a23&pid=1-s2.0-S1389128624005796-main.pdf","citationCount":"0","resultStr":"{\"title\":\"A novel approach for application classification with encrypted traffic using BERT and packet headers\",\"authors\":\"\",\"doi\":\"10.1016/j.comnet.2024.110747\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Recent years have seen substantial advancements in Internet technology along with environmental changes, which have led to the emergence of various security issues. There is also a trend of explosive growth in applications that encrypt network traffic for various types of services. Therefore, the classification of applications within encrypted traffic represents an important research issue for both secure network management and efficient bandwidth management. In such encrypted traffic, the payload itself is encrypted, and it is no longer viable to classify applications based on signatures extracted from plaintext. Most applications in public datasets for encrypted traffic classification are collected with the same IP address and port number, which makes the 5-tuple information a strong identifier. However, this 5-tuple contains many characteristics related to both the traffic collection environment and user-specific traits, rather than intrinsic features of the applications themselves. Therefore, when addressing the problem of encrypted traffic application classification, it is advisable to utilize header information excluding the 5-tuple and payload. Therefore, this paper proposes a novel service type and application classification system based on the Bidirectional Encoding Representation Transformer (BERT), which utilizes packet header information from encrypted traffic. The proposed system ensures the accuracy and generalization performance of the classification model by using only the header information from traffic packets, excluding the 5-tuple and payload. Further, to preserve the characteristics and semantic meaning of an encrypted traffic packet, sentences embedded with 2-byte tokens were used as input for BERT. The proposed system was designed to exclude labeling information from all sentences during the pre-training phase before proceeding with training. Fine-tuning was then conducted to align the system with the objectives of the service type and application classification. This experiment utilized the publicly available ISCX VPN-nonVPN dataset, and the proposed model achieved remarkable accuracy in the key performance measure, i.e., F1-scores, with values of 99.24 % in service type classification and 98.74 % in application classification. This capability can be used in maintaining the confidentiality of encrypted traffic, network security monitoring, Quality of Service (QoS), and traffic management in complex IT environments.</p></div>\",\"PeriodicalId\":50637,\"journal\":{\"name\":\"Computer Networks\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2024-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S1389128624005796/pdfft?md5=cb87abf0c95322289a1157af054f1a23&pid=1-s2.0-S1389128624005796-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1389128624005796\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1389128624005796","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

近年来,随着环境的变化,互联网技术也取得了长足的进步,从而导致了各种安全问题的出现。为各类服务加密网络流量的应用也呈爆炸式增长趋势。因此,加密流量中的应用分类是安全网络管理和高效带宽管理的一个重要研究课题。在这类加密流量中,有效载荷本身已被加密,根据从明文中提取的签名对应用程序进行分类已不再可行。用于加密流量分类的公共数据集中收集的大多数应用程序都具有相同的 IP 地址和端口号,这使得 5 元组信息成为一个强有力的标识符。然而,这个 5 元组包含许多与流量收集环境和用户特定特征有关的特征,而不是应用程序本身的内在特征。因此,在解决加密流量应用分类问题时,最好利用不包括 5 元组和有效载荷的头信息。因此,本文提出了一种基于双向编码表示变换器(BERT)的新型服务类型和应用分类系统,该系统利用了加密流量的数据包头信息。该系统只使用流量包的包头信息,不包括 5 元组和有效载荷,从而确保了分类模型的准确性和泛化性能。此外,为了保留加密流量包的特征和语义,BERT 使用了嵌入 2 字节标记的句子作为输入。在预训练阶段,拟议系统的设计是在进行训练之前排除所有句子中的标签信息。然后进行微调,使系统符合服务类型和应用分类的目标。该实验使用了公开的 ISCX VPN-nonVPN 数据集,所提出的模型在关键性能指标(即 F1 分数)上取得了显著的准确性,在服务类型分类中达到 99.24 %,在应用分类中达到 98.74 %。这种能力可用于维护加密流量的机密性、网络安全监控、服务质量(QoS)以及复杂 IT 环境中的流量管理。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A novel approach for application classification with encrypted traffic using BERT and packet headers

Recent years have seen substantial advancements in Internet technology along with environmental changes, which have led to the emergence of various security issues. There is also a trend of explosive growth in applications that encrypt network traffic for various types of services. Therefore, the classification of applications within encrypted traffic represents an important research issue for both secure network management and efficient bandwidth management. In such encrypted traffic, the payload itself is encrypted, and it is no longer viable to classify applications based on signatures extracted from plaintext. Most applications in public datasets for encrypted traffic classification are collected with the same IP address and port number, which makes the 5-tuple information a strong identifier. However, this 5-tuple contains many characteristics related to both the traffic collection environment and user-specific traits, rather than intrinsic features of the applications themselves. Therefore, when addressing the problem of encrypted traffic application classification, it is advisable to utilize header information excluding the 5-tuple and payload. Therefore, this paper proposes a novel service type and application classification system based on the Bidirectional Encoding Representation Transformer (BERT), which utilizes packet header information from encrypted traffic. The proposed system ensures the accuracy and generalization performance of the classification model by using only the header information from traffic packets, excluding the 5-tuple and payload. Further, to preserve the characteristics and semantic meaning of an encrypted traffic packet, sentences embedded with 2-byte tokens were used as input for BERT. The proposed system was designed to exclude labeling information from all sentences during the pre-training phase before proceeding with training. Fine-tuning was then conducted to align the system with the objectives of the service type and application classification. This experiment utilized the publicly available ISCX VPN-nonVPN dataset, and the proposed model achieved remarkable accuracy in the key performance measure, i.e., F1-scores, with values of 99.24 % in service type classification and 98.74 % in application classification. This capability can be used in maintaining the confidentiality of encrypted traffic, network security monitoring, Quality of Service (QoS), and traffic management in complex IT environments.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computer Networks
Computer Networks 工程技术-电信学
CiteScore
10.80
自引率
3.60%
发文量
434
审稿时长
8.6 months
期刊介绍: Computer Networks is an international, archival journal providing a publication vehicle for complete coverage of all topics of interest to those involved in the computer communications networking area. The audience includes researchers, managers and operators of networks as well as designers and implementors. The Editorial Board will consider any material for publication that is of interest to those groups.
期刊最新文献
SD-MDN-TM: A traceback and mitigation integrated mechanism against DDoS attacks with IP spoofing On the aggregation of FIBs at ICN routers using routing strategy Protecting unauthenticated messages in LTE/5G mobile networks: A two-level Hierarchical Identity-Based Signature (HIBS) solution A two-step linear programming approach for repeater placement in large-scale quantum networks Network traffic prediction based on PSO-LightGBM-TM
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1