Deep just-in-time defect prediction: how far are we?

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis Pub Date : 2021-07-11 DOI:10.1145/3460319.3464819

Zhen Zeng, Yuqun Zhang, Haotian Zhang, Lingming Zhang

{"title":"Deep just-in-time defect prediction: how far are we?","authors":"Zhen Zeng, Yuqun Zhang, Haotian Zhang, Lingming Zhang","doi":"10.1145/3460319.3464819","DOIUrl":null,"url":null,"abstract":"Defect prediction aims to automatically identify potential defective code with minimal human intervention and has been widely studied in the literature. Just-in-Time (JIT) defect prediction focuses on program changes rather than whole programs, and has been widely adopted in continuous testing. CC2Vec, state-of-the-art JIT defect prediction tool, first constructs a hierarchical attention network (HAN) to learn distributed vector representations of both code additions and deletions, and then concatenates them with two other embedding vectors representing commit messages and overall code changes extracted by the existing DeepJIT approach to train a model for predicting whether a given commit is defective. Although CC2Vec has been shown to be the state of the art for JIT defect prediction, it was only evaluated on a limited dataset and not compared with all representative baselines. Therefore, to further investigate the efficacy and limitations of CC2Vec, this paper performs an extensive study of CC2Vec on a large-scale dataset with over 310,370 changes (8.3 X larger than the original CC2Vec dataset). More specifically, we also empirically compare CC2Vec against DeepJIT and representative traditional JIT defect prediction techniques. The experimental results show that CC2Vec cannot consistently outperform DeepJIT, and neither of them can consistently outperform traditional JIT defect prediction. We also investigate the impact of individual traditional defect prediction features and find that the added-line-number feature outperforms other traditional features. Inspired by this finding, we construct a simplistic JIT defect prediction approach which simply adopts the added-line-number feature with the logistic regression classifier. Surprisingly, such a simplistic approach can outperform CC2Vec and DeepJIT in defect prediction, and can be 81k X/120k X faster in training/testing. Furthermore, the paper also provides various practical guidelines for advancing JIT defect prediction in the near future.","PeriodicalId":188008,"journal":{"name":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"42","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3460319.3464819","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 42

Abstract

Defect prediction aims to automatically identify potential defective code with minimal human intervention and has been widely studied in the literature. Just-in-Time (JIT) defect prediction focuses on program changes rather than whole programs, and has been widely adopted in continuous testing. CC2Vec, state-of-the-art JIT defect prediction tool, first constructs a hierarchical attention network (HAN) to learn distributed vector representations of both code additions and deletions, and then concatenates them with two other embedding vectors representing commit messages and overall code changes extracted by the existing DeepJIT approach to train a model for predicting whether a given commit is defective. Although CC2Vec has been shown to be the state of the art for JIT defect prediction, it was only evaluated on a limited dataset and not compared with all representative baselines. Therefore, to further investigate the efficacy and limitations of CC2Vec, this paper performs an extensive study of CC2Vec on a large-scale dataset with over 310,370 changes (8.3 X larger than the original CC2Vec dataset). More specifically, we also empirically compare CC2Vec against DeepJIT and representative traditional JIT defect prediction techniques. The experimental results show that CC2Vec cannot consistently outperform DeepJIT, and neither of them can consistently outperform traditional JIT defect prediction. We also investigate the impact of individual traditional defect prediction features and find that the added-line-number feature outperforms other traditional features. Inspired by this finding, we construct a simplistic JIT defect prediction approach which simply adopts the added-line-number feature with the logistic regression classifier. Surprisingly, such a simplistic approach can outperform CC2Vec and DeepJIT in defect prediction, and can be 81k X/120k X faster in training/testing. Furthermore, the paper also provides various practical guidelines for advancing JIT defect prediction in the near future.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

深度即时缺陷预测:我们做了多远?

缺陷预测旨在以最小的人为干预自动识别潜在的缺陷代码，并在文献中得到了广泛的研究。JIT (Just-in-Time)缺陷预测关注于程序的变化而不是整个程序，并且在连续测试中被广泛采用。CC2Vec是最先进的JIT缺陷预测工具，它首先构建了一个分层关注网络(HAN)来学习代码添加和删除的分布式向量表示，然后将它们与另外两个表示提交消息和现有DeepJIT方法提取的整体代码更改的嵌入向量连接起来，以训练预测给定提交是否有缺陷的模型。尽管CC2Vec已经被证明是JIT缺陷预测的最新技术，但是它只在有限的数据集上进行了评估，并且没有与所有具有代表性的基线进行比较。因此，为了进一步研究CC2Vec的有效性和局限性，本文在一个超过310,370个变化的大型数据集(比原始CC2Vec数据集大8.3倍)上对CC2Vec进行了广泛的研究。更具体地说，我们还将CC2Vec与DeepJIT和具有代表性的传统JIT缺陷预测技术进行了经验比较。实验结果表明，CC2Vec不能始终优于DeepJIT，两者都不能始终优于传统JIT缺陷预测。我们还研究了单个传统缺陷预测特征的影响，并发现添加行数特征优于其他传统特征。受此发现的启发，我们构建了一种简单的JIT缺陷预测方法，该方法简单地采用了添加行数特征和逻辑回归分类器。令人惊讶的是，这种简单的方法在缺陷预测方面可以胜过CC2Vec和DeepJIT，并且在训练/测试方面可以快81k /120k X。此外，本文还提供了在不久的将来推进JIT缺陷预测的各种实用指南。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

自引率

0.00%

发文量

期刊最新文献

Semantic table structure identification in spreadsheets Parema: an unpacking framework for demystifying VM-based Android packers TERA: optimizing stochastic regression tests in machine learning projects Empirically evaluating readily available information for regression test optimization in continuous integration RESTest: automated black-box testing of RESTful web APIs