深度程序处理模型的鲁棒性——检测、估计和增强

ACM Transactions on Software Engineering and Methodology (TOSEM) Pub Date : 2022-01-31 DOI:10.1145/3511887

Huangzhao Zhang, Zhiyi Fu, Ge Li, L. Ma, Zhehao Zhao, Hua’an Yang, Yizhe Sun, Yang Liu, Zhi Jin

{"title":"深度程序处理模型的鲁棒性——检测、估计和增强","authors":"Huangzhao Zhang, Zhiyi Fu, Ge Li, L. Ma, Zhehao Zhao, Hua’an Yang, Yizhe Sun, Yang Liu, Zhi Jin","doi":"10.1145/3511887","DOIUrl":null,"url":null,"abstract":"Deep learning (DL) has recently been widely applied to diverse source code processing tasks in the software engineering (SE) community, which achieves competitive performance (e.g., accuracy). However, the robustness, which requires the model to produce consistent decisions given minorly perturbed code inputs, still lacks systematic investigation as an important quality indicator. This article initiates an early step and proposes a framework CARROT for robustness detection, measurement, and enhancement of DL models for source code processing. We first propose an optimization-based attack technique CARROTA to generate valid adversarial source code examples effectively and efficiently. Based on this, we define the robustness metrics and propose robustness measurement toolkit CARROTM, which employs the worst-case performance approximation under the allowable perturbations. We further propose to improve the robustness of the DL models by adversarial training (CARROTT) with our proposed attack techniques. Our in-depth evaluations on three source code processing tasks (i.e., functionality classification, code clone detection, defect prediction) containing more than 3 million lines of code and the classic or SOTA DL models, including GRU, LSTM, ASTNN, LSCNN, TBCNN, CodeBERT, and CDLH, demonstrate the usefulness of our techniques for ❶ effective and efficient adversarial example detection, ❷ tight robustness estimation, and ❸ effective robustness enhancement.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"122 1","pages":"1 - 40"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"Towards Robustness of Deep Program Processing Models—Detection, Estimation, and Enhancement\",\"authors\":\"Huangzhao Zhang, Zhiyi Fu, Ge Li, L. Ma, Zhehao Zhao, Hua’an Yang, Yizhe Sun, Yang Liu, Zhi Jin\",\"doi\":\"10.1145/3511887\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning (DL) has recently been widely applied to diverse source code processing tasks in the software engineering (SE) community, which achieves competitive performance (e.g., accuracy). However, the robustness, which requires the model to produce consistent decisions given minorly perturbed code inputs, still lacks systematic investigation as an important quality indicator. This article initiates an early step and proposes a framework CARROT for robustness detection, measurement, and enhancement of DL models for source code processing. We first propose an optimization-based attack technique CARROTA to generate valid adversarial source code examples effectively and efficiently. Based on this, we define the robustness metrics and propose robustness measurement toolkit CARROTM, which employs the worst-case performance approximation under the allowable perturbations. We further propose to improve the robustness of the DL models by adversarial training (CARROTT) with our proposed attack techniques. Our in-depth evaluations on three source code processing tasks (i.e., functionality classification, code clone detection, defect prediction) containing more than 3 million lines of code and the classic or SOTA DL models, including GRU, LSTM, ASTNN, LSCNN, TBCNN, CodeBERT, and CDLH, demonstrate the usefulness of our techniques for ❶ effective and efficient adversarial example detection, ❷ tight robustness estimation, and ❸ effective robustness enhancement.\",\"PeriodicalId\":7398,\"journal\":{\"name\":\"ACM Transactions on Software Engineering and Methodology (TOSEM)\",\"volume\":\"122 1\",\"pages\":\"1 - 40\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Software Engineering and Methodology (TOSEM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3511887\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Software Engineering and Methodology (TOSEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3511887","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

摘要

深度学习(DL)最近被广泛应用于软件工程(SE)社区的各种源代码处理任务，实现了具有竞争力的性能(例如，准确性)。然而，鲁棒性要求模型在给定轻微扰动的代码输入的情况下产生一致的决策，仍然缺乏作为重要质量指标的系统调查。本文开始了早期的一步，并提出了一个框架CARROT，用于源代码处理的深度学习模型的鲁棒性检测、测量和增强。我们首先提出了一种基于优化的攻击技术CARROTA，以有效地生成有效的对抗性源代码示例。在此基础上，我们定义了鲁棒性度量，并提出了鲁棒性度量工具包CARROTM，该工具包在允许扰动下采用最坏情况性能逼近。我们进一步提出利用我们提出的攻击技术，通过对抗性训练(CARROTT)来提高深度学习模型的鲁棒性。我们对包含300多万行代码的三个源代码处理任务(即功能分类、代码克隆检测、缺陷预测)和经典或SOTA深度学习模型(包括GRU、LSTM、ASTNN、LSCNN、TBCNN、CodeBERT和CDLH)进行了深入评估，证明了我们的技术在高效、有效的对抗示例检测、严格鲁棒性估计和有效鲁棒性增强方面的实用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Towards Robustness of Deep Program Processing Models—Detection, Estimation, and Enhancement

Deep learning (DL) has recently been widely applied to diverse source code processing tasks in the software engineering (SE) community, which achieves competitive performance (e.g., accuracy). However, the robustness, which requires the model to produce consistent decisions given minorly perturbed code inputs, still lacks systematic investigation as an important quality indicator. This article initiates an early step and proposes a framework CARROT for robustness detection, measurement, and enhancement of DL models for source code processing. We first propose an optimization-based attack technique CARROTA to generate valid adversarial source code examples effectively and efficiently. Based on this, we define the robustness metrics and propose robustness measurement toolkit CARROTM, which employs the worst-case performance approximation under the allowable perturbations. We further propose to improve the robustness of the DL models by adversarial training (CARROTT) with our proposed attack techniques. Our in-depth evaluations on three source code processing tasks (i.e., functionality classification, code clone detection, defect prediction) containing more than 3 million lines of code and the classic or SOTA DL models, including GRU, LSTM, ASTNN, LSCNN, TBCNN, CodeBERT, and CDLH, demonstrate the usefulness of our techniques for ❶ effective and efficient adversarial example detection, ❷ tight robustness estimation, and ❸ effective robustness enhancement.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助