{"title":"Cross-Language Binary-Source Code Matching Based on Rust and Intermediate Representation","authors":"Jiacheng Mao, Zukai Tang, Wenbi Rao","doi":"10.1109/CCAI57533.2023.10201266","DOIUrl":null,"url":null,"abstract":"Binary-source code matching is a crucial task in computer security and software engineering, enabling reverse engineering by matching binary code with its corresponding source code and aiding vulnerability detection by searching for binary code given its source code. However, cross-language binary-source code matching remains largely unexplored, with existing research mostly focusing on C/C++ and Java due to a lack of suitable datasets. This paper attempts to address this gap by introducing a new language, Rust, and conducting experiments on it for the task. Rust offers several advantages, such as being commonly compiled to LLVM-IR and binary files, and its compiler performs various transformations during compilation to ensure memory safety, thread safety, and null safety of programs, resulting in an increased prevalence of function clones. Matching binary-source code across Rust and C/C++ poses greater challenges and research opportunities. Moreover, in this paper, we also re-trained the OSCAR model on the CodeNet dataset and evaluated its performance, which we call XLOSCAR. We designed a series of experiments to analyze cross-language binary-source code matching between Rust and C/C++, and compared XLOSCAR with the general OSCAR model.","PeriodicalId":285760,"journal":{"name":"2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCAI57533.2023.10201266","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Binary-source code matching is a crucial task in computer security and software engineering, enabling reverse engineering by matching binary code with its corresponding source code and aiding vulnerability detection by searching for binary code given its source code. However, cross-language binary-source code matching remains largely unexplored, with existing research mostly focusing on C/C++ and Java due to a lack of suitable datasets. This paper attempts to address this gap by introducing a new language, Rust, and conducting experiments on it for the task. Rust offers several advantages, such as being commonly compiled to LLVM-IR and binary files, and its compiler performs various transformations during compilation to ensure memory safety, thread safety, and null safety of programs, resulting in an increased prevalence of function clones. Matching binary-source code across Rust and C/C++ poses greater challenges and research opportunities. Moreover, in this paper, we also re-trained the OSCAR model on the CodeNet dataset and evaluated its performance, which we call XLOSCAR. We designed a series of experiments to analyze cross-language binary-source code matching between Rust and C/C++, and compared XLOSCAR with the general OSCAR model.