Automated WebAssembly Function Purpose Identification With Semantics-Aware Analysis

Proceedings of the ACM Web Conference 2023 Pub Date : 2023-04-30 DOI:10.1145/3543507.3583235

Alan Romano, Weihang Wang

{"title":"Automated WebAssembly Function Purpose Identification With Semantics-Aware Analysis","authors":"Alan Romano, Weihang Wang","doi":"10.1145/3543507.3583235","DOIUrl":null,"url":null,"abstract":"WebAssembly is a recent web standard built for better performance in web applications. The standard defines a binary code format to use as a compilation target for a variety of languages, such as C, C++, and Rust. The standard also defines a text representation for readability, although, WebAssembly modules are difficult to interpret by human readers, regardless of their experience level. This makes it difficult to understand and maintain any existing WebAssembly code. As a result, third-party WebAssembly modules need to be implicitly trusted by developers as verifying the functionality themselves may not be feasible. To this end, we construct WASPur, a tool to automatically identify the purposes of WebAssembly functions. To build this tool, we first construct an extensive collection of WebAssembly samples that represent the state of WebAssembly. Second, we analyze the dataset and identify the diverse use cases of the collected WebAssembly modules. We leverage the dataset of WebAssembly modules to construct semantics-aware intermediate representations (IR) of the functions in the modules. We encode the function IR for use in a machine learning classifier, and we find that this classifier can predict the similarity of a given function against known named functions with an accuracy rate of 88.07%. We hope our tool will enable inspection of optimized and minified WebAssembly modules that remove function names and most other semantic identifiers.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Web Conference 2023","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3543507.3583235","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

WebAssembly is a recent web standard built for better performance in web applications. The standard defines a binary code format to use as a compilation target for a variety of languages, such as C, C++, and Rust. The standard also defines a text representation for readability, although, WebAssembly modules are difficult to interpret by human readers, regardless of their experience level. This makes it difficult to understand and maintain any existing WebAssembly code. As a result, third-party WebAssembly modules need to be implicitly trusted by developers as verifying the functionality themselves may not be feasible. To this end, we construct WASPur, a tool to automatically identify the purposes of WebAssembly functions. To build this tool, we first construct an extensive collection of WebAssembly samples that represent the state of WebAssembly. Second, we analyze the dataset and identify the diverse use cases of the collected WebAssembly modules. We leverage the dataset of WebAssembly modules to construct semantics-aware intermediate representations (IR) of the functions in the modules. We encode the function IR for use in a machine learning classifier, and we find that this classifier can predict the similarity of a given function against known named functions with an accuracy rate of 88.07%. We hope our tool will enable inspection of optimized and minified WebAssembly modules that remove function names and most other semantic identifiers.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用语义感知分析的自动WebAssembly功能目的识别

WebAssembly是最近为提高web应用程序的性能而构建的一种web标准。该标准定义了一种二进制代码格式，作为各种语言(如C、c++和Rust)的编译目标。该标准还定义了可读性的文本表示，尽管WebAssembly模块很难被人类读者解释，无论他们的经验水平如何。这使得理解和维护任何现有的WebAssembly代码变得困难。因此，第三方WebAssembly模块需要得到开发人员的隐式信任，因为验证功能本身可能是不可行的。为此，我们构造WASPur，这是一个自动识别WebAssembly函数用途的工具。为了构建这个工具，我们首先构造一个广泛的WebAssembly示例集合，这些示例表示WebAssembly的状态。其次，我们分析数据集并确定所收集的WebAssembly模块的不同用例。我们利用WebAssembly模块的数据集来构建模块中功能的语义感知的中间表示(IR)。我们将函数IR编码用于机器学习分类器中，我们发现该分类器可以预测给定函数与已知命名函数的相似性，准确率为88.07%。我们希望我们的工具能够检查优化和最小化的WebAssembly模块，删除函数名和大多数其他语义标识符。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the ACM Web Conference 2023

自引率

0.00%

发文量

期刊最新文献

CurvDrop: A Ricci Curvature Based Approach to Prevent Graph Neural Networks from Over-Smoothing and Over-Squashing Learning to Simulate Crowd Trajectories with Graph Networks Word Sense Disambiguation by Refining Target Word Embedding Curriculum Graph Poisoning Optimizing Guided Traversal for Fast Learned Sparse Retrieval