MoCo: Fuzzing Deep Learning Libraries via Assembling Code

IF 5.6 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING IEEE Transactions on Software Engineering Pub Date : 2024-12-02 DOI:10.1109/TSE.2024.3509975

Pin Ji;Yang Feng;Duo Wu;Lingyue Yan;Penglin Chen;Jia Liu;Zhihong Zhao

{"title":"MoCo: Fuzzing Deep Learning Libraries via Assembling Code","authors":"Pin Ji;Yang Feng;Duo Wu;Lingyue Yan;Penglin Chen;Jia Liu;Zhihong Zhao","doi":"10.1109/TSE.2024.3509975","DOIUrl":null,"url":null,"abstract":"The rapidly developing Deep Learning (DL) techniques have been applied in software systems of various types. However, they can also pose new safety threats with potentially serious consequences, especially in safety-critical domains. DL libraries serve as the underlying foundation for DL systems, and bugs in them can have unpredictable impacts that directly affect the behaviors of DL systems. Previous research on fuzzing DL libraries still has limitations in generating tests corresponding to crucial testing scenarios and constructing test oracles. In this paper, we propose <monospace>MoCo</monospace>, a novel fuzzing testing method for DL libraries via assembling code. The seed tests used by <monospace>MoCo</monospace> are code files that implement DL models, covering both model construction and training in the most common real-world application scenarios for DL libraries. <monospace>MoCo</monospace> first disassembles the seed code files to extract templates and code blocks, then applies code block mutation operators (e.g., API replacement, random generation, and boundary checking) to generate new code blocks that fit the template. To ensure the correctness of the code block mutation, we employ the Large Language Model to parse the official documents of DL libraries for information about the parameters and the constraints between them. By inserting context-appropriate code blocks into the template, <monospace>MoCo</monospace> can generate a tree of code files with intergenerational relations. According to the derivation relations in this tree, we construct the test oracle based on the execution state consistency and the calculation result consistency. Since the granularity of code assembly is controlled rather than randomly divergent, we can quickly pinpoint the lines of code where the bugs are located and the corresponding triggering conditions. We conduct a comprehensive experiment to evaluate the efficiency and effectiveness of <monospace>MoCo</monospace> using three widely-used DL libraries (i.e., TensorFlow, PyTorch, and Jittor). During the experiments, <monospace>MoCo</monospace> detects 77 new bugs of four types in three DL libraries, where 55 bugs have been confirmed, and 39 bugs have been fixed by developers. The experimental results demonstrate that <monospace>MoCo</monospace> can generate high-quality tests that cover crucial testing scenarios and detect different types of bugs, which helps developers improve the reliability of DL libraries.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 2","pages":"371-388"},"PeriodicalIF":5.6000,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10772249/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

The rapidly developing Deep Learning (DL) techniques have been applied in software systems of various types. However, they can also pose new safety threats with potentially serious consequences, especially in safety-critical domains. DL libraries serve as the underlying foundation for DL systems, and bugs in them can have unpredictable impacts that directly affect the behaviors of DL systems. Previous research on fuzzing DL libraries still has limitations in generating tests corresponding to crucial testing scenarios and constructing test oracles. In this paper, we propose MoCo, a novel fuzzing testing method for DL libraries via assembling code. The seed tests used by MoCo are code files that implement DL models, covering both model construction and training in the most common real-world application scenarios for DL libraries. MoCo first disassembles the seed code files to extract templates and code blocks, then applies code block mutation operators (e.g., API replacement, random generation, and boundary checking) to generate new code blocks that fit the template. To ensure the correctness of the code block mutation, we employ the Large Language Model to parse the official documents of DL libraries for information about the parameters and the constraints between them. By inserting context-appropriate code blocks into the template, MoCo can generate a tree of code files with intergenerational relations. According to the derivation relations in this tree, we construct the test oracle based on the execution state consistency and the calculation result consistency. Since the granularity of code assembly is controlled rather than randomly divergent, we can quickly pinpoint the lines of code where the bugs are located and the corresponding triggering conditions. We conduct a comprehensive experiment to evaluate the efficiency and effectiveness of MoCo using three widely-used DL libraries (i.e., TensorFlow, PyTorch, and Jittor). During the experiments, MoCo detects 77 new bugs of four types in three DL libraries, where 55 bugs have been confirmed, and 39 bugs have been fixed by developers. The experimental results demonstrate that MoCo can generate high-quality tests that cover crucial testing scenarios and detect different types of bugs, which helps developers improve the reliability of DL libraries.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MoCo：通过汇编代码模糊化深度学习库

快速发展的深度学习技术已经应用于各种类型的软件系统中。然而，它们也可能带来新的安全威胁，并带来潜在的严重后果，尤其是在安全关键领域。DL库是DL系统的底层基础，其中的错误可能会产生不可预测的影响，直接影响DL系统的行为。以往对模糊DL库的研究在生成对应于关键测试场景的测试和构建测试oracle方面仍然存在局限性。本文提出了一种新的基于汇编代码的DL库模糊测试方法MoCo。MoCo使用的种子测试是实现深度学习模型的代码文件，涵盖了深度学习库最常见的实际应用场景中的模型构建和训练。MoCo首先对种子代码文件进行反汇编，提取模板和代码块，然后应用代码块突变操作符（如API替换、随机生成和边界检查）生成适合模板的新代码块。为了保证代码块突变的正确性，我们采用了大型语言模型来解析DL库的官方文档，以获取参数和它们之间的约束信息。通过在模板中插入上下文合适的代码块，MoCo可以生成具有代际关系的代码文件树。根据该树的派生关系，构建了基于执行状态一致性和计算结果一致性的测试oracle。由于代码汇编的粒度是受控的，而不是随机发散的，因此我们可以快速地确定代码行中错误所在的位置以及相应的触发条件。我们使用三个广泛使用的深度学习库（即TensorFlow， PyTorch和Jittor）进行了全面的实验来评估MoCo的效率和有效性。在实验过程中，MoCo在3个DL库中发现了4种类型的77个新bug，其中55个bug已经被确认，39个bug已经被开发人员修复。实验结果表明，MoCo可以生成覆盖关键测试场景的高质量测试，并检测不同类型的bug，从而帮助开发人员提高DL库的可靠性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Software Engineering 工程技术-工程：电子与电气

CiteScore

9.70

自引率

10.80%

发文量

724

审稿时长

6 months

期刊介绍： IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.