BContext2Name:基于多标签学习和神经网络的剥离二进制文件命名函数

IF 3.7 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of Cloud Computing-Advances Systems and Applications Pub Date : 2023-07-01 DOI:10.1109/CSCloud-EdgeCom58631.2023.00037
Bing Xia, Yunxiang Ge, Ruinan Yang, Jiabin Yin, Jianmin Pang, Chongjun Tang
{"title":"BContext2Name:基于多标签学习和神经网络的剥离二进制文件命名函数","authors":"Bing Xia, Yunxiang Ge, Ruinan Yang, Jiabin Yin, Jianmin Pang, Chongjun Tang","doi":"10.1109/CSCloud-EdgeCom58631.2023.00037","DOIUrl":null,"url":null,"abstract":"Conducting binary function naming helps reverse engineers understand the internal workings of the code and perform malicious code analysis without accessing the source code. However, the loss of debugging information poses the challenge of insufficient high-level semantic information description for stripping binary code function naming. Meanwhile, the existing binary function naming scheme has one function label for only one sample. The long-tail effect of function labels for a single sample makes the machine learning-based prediction models face the challenge. To obtain a function correlation label and improve the propensity score of uncommon tail labels, we propose a multi-label learning-based binary function naming model BContext2Name. This model automatically generates relevant labels for binary function naming by function context information with the help of PfastreXML model. The experimental results show that BContext2Name can enrich function labels and alleviate the long-tail effect that exists for a single sample class. To obtain high-level semantics of binary functions, we align pseudocode and basic blocks based on disassembly and decompilation, identify concrete or abstract values of API parameters by variable tracking, and construct API-enhanced control flow graphs. Finally, a seq2seq neural network translation model with attention mechanism is constructed between function multi-label learning and enhanced control flow graphs. Experiments on the dataset reveal that the F1 values of the BContext2Name model improve by 3.55% and 15.23% over the state-of-the-art XFL and Nero, respectively. This indicates that function multi-label learning can provide accurate labels for binary functions and can help reverse analysts understand the inner working mechanism of binary code. Code and data for this evaluation are available at https://github.com/CSecurityZhongYuan/BContext2Name.","PeriodicalId":56007,"journal":{"name":"Journal of Cloud Computing-Advances Systems and Applications","volume":"22 1","pages":"167-172"},"PeriodicalIF":3.7000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"BContext2Name: Naming Functions in Stripped Binaries with Multi-Label Learning and Neural Networks\",\"authors\":\"Bing Xia, Yunxiang Ge, Ruinan Yang, Jiabin Yin, Jianmin Pang, Chongjun Tang\",\"doi\":\"10.1109/CSCloud-EdgeCom58631.2023.00037\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Conducting binary function naming helps reverse engineers understand the internal workings of the code and perform malicious code analysis without accessing the source code. However, the loss of debugging information poses the challenge of insufficient high-level semantic information description for stripping binary code function naming. Meanwhile, the existing binary function naming scheme has one function label for only one sample. The long-tail effect of function labels for a single sample makes the machine learning-based prediction models face the challenge. To obtain a function correlation label and improve the propensity score of uncommon tail labels, we propose a multi-label learning-based binary function naming model BContext2Name. This model automatically generates relevant labels for binary function naming by function context information with the help of PfastreXML model. The experimental results show that BContext2Name can enrich function labels and alleviate the long-tail effect that exists for a single sample class. To obtain high-level semantics of binary functions, we align pseudocode and basic blocks based on disassembly and decompilation, identify concrete or abstract values of API parameters by variable tracking, and construct API-enhanced control flow graphs. Finally, a seq2seq neural network translation model with attention mechanism is constructed between function multi-label learning and enhanced control flow graphs. Experiments on the dataset reveal that the F1 values of the BContext2Name model improve by 3.55% and 15.23% over the state-of-the-art XFL and Nero, respectively. This indicates that function multi-label learning can provide accurate labels for binary functions and can help reverse analysts understand the inner working mechanism of binary code. Code and data for this evaluation are available at https://github.com/CSecurityZhongYuan/BContext2Name.\",\"PeriodicalId\":56007,\"journal\":{\"name\":\"Journal of Cloud Computing-Advances Systems and Applications\",\"volume\":\"22 1\",\"pages\":\"167-172\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2023-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Cloud Computing-Advances Systems and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/CSCloud-EdgeCom58631.2023.00037\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cloud Computing-Advances Systems and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/CSCloud-EdgeCom58631.2023.00037","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

进行二进制函数命名有助于逆向工程师理解代码的内部工作原理,并在不访问源代码的情况下执行恶意代码分析。然而,调试信息的丢失给剥离二进制代码函数命名带来了高级语义信息描述不足的挑战。同时,现有的二进制函数命名方案只对一个样本使用一个函数标签。单个样本的函数标签的长尾效应使得基于机器学习的预测模型面临挑战。为了获得函数相关标签并提高不常见尾标签的倾向得分,我们提出了一种基于多标签学习的二元函数命名模型BContext2Name。该模型借助PfastreXML模型,根据函数上下文信息自动生成二进制函数命名的相关标签。实验结果表明,BContext2Name可以丰富函数标签,减轻单个样本类存在的长尾效应。为了获得二进制函数的高级语义,我们基于反汇编和反编译对伪代码和基本块进行对齐,通过变量跟踪识别API参数的具体或抽象值,并构建API增强控制流图。最后,在函数多标签学习和增强控制流图之间构建了一个具有注意机制的seq2seq神经网络翻译模型。在数据集上的实验表明,BContext2Name模型的F1值比最先进的XFL和Nero分别提高了3.55%和15.23%。这表明函数多标签学习可以为二进制函数提供准确的标签,有助于逆向分析人员了解二进制代码的内部工作机制。此评估的代码和数据可在https://github.com/CSecurityZhongYuan/BContext2Name上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
BContext2Name: Naming Functions in Stripped Binaries with Multi-Label Learning and Neural Networks
Conducting binary function naming helps reverse engineers understand the internal workings of the code and perform malicious code analysis without accessing the source code. However, the loss of debugging information poses the challenge of insufficient high-level semantic information description for stripping binary code function naming. Meanwhile, the existing binary function naming scheme has one function label for only one sample. The long-tail effect of function labels for a single sample makes the machine learning-based prediction models face the challenge. To obtain a function correlation label and improve the propensity score of uncommon tail labels, we propose a multi-label learning-based binary function naming model BContext2Name. This model automatically generates relevant labels for binary function naming by function context information with the help of PfastreXML model. The experimental results show that BContext2Name can enrich function labels and alleviate the long-tail effect that exists for a single sample class. To obtain high-level semantics of binary functions, we align pseudocode and basic blocks based on disassembly and decompilation, identify concrete or abstract values of API parameters by variable tracking, and construct API-enhanced control flow graphs. Finally, a seq2seq neural network translation model with attention mechanism is constructed between function multi-label learning and enhanced control flow graphs. Experiments on the dataset reveal that the F1 values of the BContext2Name model improve by 3.55% and 15.23% over the state-of-the-art XFL and Nero, respectively. This indicates that function multi-label learning can provide accurate labels for binary functions and can help reverse analysts understand the inner working mechanism of binary code. Code and data for this evaluation are available at https://github.com/CSecurityZhongYuan/BContext2Name.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Cloud Computing-Advances Systems and Applications
Journal of Cloud Computing-Advances Systems and Applications Computer Science-Computer Networks and Communications
CiteScore
6.80
自引率
7.50%
发文量
76
审稿时长
75 days
期刊介绍: The Journal of Cloud Computing: Advances, Systems and Applications (JoCCASA) will publish research articles on all aspects of Cloud Computing. Principally, articles will address topics that are core to Cloud Computing, focusing on the Cloud applications, the Cloud systems, and the advances that will lead to the Clouds of the future. Comprehensive review and survey articles that offer up new insights, and lay the foundations for further exploratory and experimental work, are also relevant.
期刊最新文献
Research on electromagnetic vibration energy harvester for cloud-edge-end collaborative architecture in power grid FedEem: a fairness-based asynchronous federated learning mechanism Adaptive device sampling and deadline determination for cloud-based heterogeneous federated learning Review on the application of cloud computing in the sports industry Improving cloud storage and privacy security for digital twin based medical records
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1