Crafting Binary Protocol Reversing via Deep Learning With Knowledge-Driven Augmentation

IF 3.6 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE/ACM Transactions on Networking Pub Date : 2024-10-10 DOI:10.1109/TNET.2024.3468350

Sen Zhao;Shouguo Yang;Zhen Wang;Yongji Liu;Hongsong Zhu;Limin Sun

{"title":"Crafting Binary Protocol Reversing via Deep Learning With Knowledge-Driven Augmentation","authors":"Sen Zhao;Shouguo Yang;Zhen Wang;Yongji Liu;Hongsong Zhu;Limin Sun","doi":"10.1109/TNET.2024.3468350","DOIUrl":null,"url":null,"abstract":"Protocol reverse engineering (PRE) serves as an instrumental tool in various security research, such as protocol fuzzing and intrusion detection. Its primary objective lies in uncovering the format, semantics, and behavior of an unknown protocol without prior information. This paper presents DL-ProS2, a deep learning-based approach for binary protocol reversing, focusing on format segmentation and semantic inference from network traffic. Our approach is underpinned by highlighting the effectiveness of multi-scale features within the network traffic for identifying various types of fields and semantics. Based on this, DL-ProS2 employs a comprehensive end-to-end model that integrates U-Net, siamese network, and BiLSTM-CRF, which enables the effective analysis of unknown protocol traffic to extract the field boundaries and semantics. Meanwhile, to address the issue of limited data diversity and coverage, we implement an innovative knowledge-driven traffic simulation technique. This method harnesses the ChatGPT to extract protocol knowledge from publicly available protocol documents, such as RFCs, as the foundational rules for the simulation. Empirical results substantiate the efficacy of our approach, demonstrating precision rates exceeding 0.95 and recall rates surpassing 0.97 for partially unknown protocol format segmentation and semantic inference. It also retains effectiveness in the inference of completely unknown protocols, with average precision and recall rates of 0.69 and 0.62 for format segmentation, and 0.43 and 0.47 for semantic inference, respectively.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 6","pages":"5399-5414"},"PeriodicalIF":3.6000,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Networking","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10713284/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Protocol reverse engineering (PRE) serves as an instrumental tool in various security research, such as protocol fuzzing and intrusion detection. Its primary objective lies in uncovering the format, semantics, and behavior of an unknown protocol without prior information. This paper presents DL-ProS2, a deep learning-based approach for binary protocol reversing, focusing on format segmentation and semantic inference from network traffic. Our approach is underpinned by highlighting the effectiveness of multi-scale features within the network traffic for identifying various types of fields and semantics. Based on this, DL-ProS2 employs a comprehensive end-to-end model that integrates U-Net, siamese network, and BiLSTM-CRF, which enables the effective analysis of unknown protocol traffic to extract the field boundaries and semantics. Meanwhile, to address the issue of limited data diversity and coverage, we implement an innovative knowledge-driven traffic simulation technique. This method harnesses the ChatGPT to extract protocol knowledge from publicly available protocol documents, such as RFCs, as the foundational rules for the simulation. Empirical results substantiate the efficacy of our approach, demonstrating precision rates exceeding 0.95 and recall rates surpassing 0.97 for partially unknown protocol format segmentation and semantic inference. It also retains effectiveness in the inference of completely unknown protocols, with average precision and recall rates of 0.69 and 0.62 for format segmentation, and 0.43 and 0.47 for semantic inference, respectively.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过知识驱动增强的深度学习制作二进制协议反转

协议逆向工程（PRE）在协议模糊和入侵检测等安全研究中发挥着重要的作用。它的主要目标是在没有先验信息的情况下揭示未知协议的格式、语义和行为。本文介绍了DL-ProS2，一种基于深度学习的二进制协议反转方法，重点关注网络流量的格式分割和语义推断。我们的方法通过强调网络流量中用于识别各种类型字段和语义的多尺度特征的有效性来支持。在此基础上，DL-ProS2采用了U-Net、siamese网络和BiLSTM-CRF集成的全面端到端模型，能够有效分析未知协议流量，提取字段边界和语义。同时，为了解决数据多样性和覆盖范围有限的问题，我们实施了一种创新的知识驱动交通模拟技术。该方法利用ChatGPT从公开可用的协议文档（如rfc）中提取协议知识，作为模拟的基本规则。实证结果证实了我们方法的有效性，表明对于部分未知的协议格式分割和语义推理，准确率超过0.95，召回率超过0.97。它在完全未知协议的推理中也保持了有效性，格式分割的平均精度和召回率分别为0.69和0.62，语义推理的平均精度和召回率分别为0.43和0.47。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE/ACM Transactions on Networking 工程技术-电信学

CiteScore

8.20

自引率

5.40%

发文量

246

审稿时长

4-8 weeks

期刊介绍： The IEEE/ACM Transactions on Networking’s high-level objective is to publish high-quality, original research results derived from theoretical or experimental exploration of the area of communication/computer networking, covering all sorts of information transport networks over all sorts of physical layer technologies, both wireline (all kinds of guided media: e.g., copper, optical) and wireless (e.g., radio-frequency, acoustic (e.g., underwater), infra-red), or hybrids of these. The journal welcomes applied contributions reporting on novel experiences and experiments with actual systems.

期刊最新文献

Table of Contents IEEE/ACM Transactions on Networking Information for Authors IEEE/ACM Transactions on Networking Society Information IEEE/ACM Transactions on Networking Publication Information FPCA: Parasitic Coding Authentication for UAVs by FM Signals