{"title":"Automated WebAssembly Function Purpose Identification With Semantics-Aware Analysis","authors":"Alan Romano, Weihang Wang","doi":"10.1145/3543507.3583235","DOIUrl":null,"url":null,"abstract":"WebAssembly is a recent web standard built for better performance in web applications. The standard defines a binary code format to use as a compilation target for a variety of languages, such as C, C++, and Rust. The standard also defines a text representation for readability, although, WebAssembly modules are difficult to interpret by human readers, regardless of their experience level. This makes it difficult to understand and maintain any existing WebAssembly code. As a result, third-party WebAssembly modules need to be implicitly trusted by developers as verifying the functionality themselves may not be feasible. To this end, we construct WASPur, a tool to automatically identify the purposes of WebAssembly functions. To build this tool, we first construct an extensive collection of WebAssembly samples that represent the state of WebAssembly. Second, we analyze the dataset and identify the diverse use cases of the collected WebAssembly modules. We leverage the dataset of WebAssembly modules to construct semantics-aware intermediate representations (IR) of the functions in the modules. We encode the function IR for use in a machine learning classifier, and we find that this classifier can predict the similarity of a given function against known named functions with an accuracy rate of 88.07%. We hope our tool will enable inspection of optimized and minified WebAssembly modules that remove function names and most other semantic identifiers.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Web Conference 2023","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3543507.3583235","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
WebAssembly is a recent web standard built for better performance in web applications. The standard defines a binary code format to use as a compilation target for a variety of languages, such as C, C++, and Rust. The standard also defines a text representation for readability, although, WebAssembly modules are difficult to interpret by human readers, regardless of their experience level. This makes it difficult to understand and maintain any existing WebAssembly code. As a result, third-party WebAssembly modules need to be implicitly trusted by developers as verifying the functionality themselves may not be feasible. To this end, we construct WASPur, a tool to automatically identify the purposes of WebAssembly functions. To build this tool, we first construct an extensive collection of WebAssembly samples that represent the state of WebAssembly. Second, we analyze the dataset and identify the diverse use cases of the collected WebAssembly modules. We leverage the dataset of WebAssembly modules to construct semantics-aware intermediate representations (IR) of the functions in the modules. We encode the function IR for use in a machine learning classifier, and we find that this classifier can predict the similarity of a given function against known named functions with an accuracy rate of 88.07%. We hope our tool will enable inspection of optimized and minified WebAssembly modules that remove function names and most other semantic identifiers.