An empirical study of just-in-time defect prediction using cross-project models

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR) Pub Date : 2014-05-31 DOI:10.1145/2597073.2597075

Takafumi Fukushima, Yasutaka Kamei, Shane McIntosh, Kazuhiro Yamashita, Naoyasu Ubayashi

{"title":"An empirical study of just-in-time defect prediction using cross-project models","authors":"Takafumi Fukushima, Yasutaka Kamei, Shane McIntosh, Kazuhiro Yamashita, Naoyasu Ubayashi","doi":"10.1145/2597073.2597075","DOIUrl":null,"url":null,"abstract":"Prior research suggests that predicting defect-inducing changes, i.e., Just-In-Time (JIT) defect prediction is a more practical alternative to traditional defect prediction techniques, providing immediate feedback while design decisions are still fresh in the minds of developers. Unfortunately, similar to traditional defect prediction models, JIT models require a large amount of training data, which is not available when projects are in initial development phases. To address this flaw in traditional defect prediction, prior work has proposed cross-project models, i.e., models learned from older projects with sufficient history. However, cross-project models have not yet been explored in the context of JIT prediction. Therefore, in this study, we empirically evaluate the performance of JIT cross-project models. Through a case study on 11 open source projects, we find that in a JIT cross-project context: (1) high performance within-project models rarely perform well; (2) models trained on projects that have similar correlations between predictor and dependent variables often perform well; and (3) ensemble learning techniques that leverage historical data from several other projects (e.g., voting experts) often perform well. Our findings empirically confirm that JIT cross-project models learned using other projects are a viable solution for projects with little historical data. However, JIT cross-project models perform best when the data used to learn them is carefully selected.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"2068 1","pages":"172-181"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"163","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2597073.2597075","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 163

Abstract

Prior research suggests that predicting defect-inducing changes, i.e., Just-In-Time (JIT) defect prediction is a more practical alternative to traditional defect prediction techniques, providing immediate feedback while design decisions are still fresh in the minds of developers. Unfortunately, similar to traditional defect prediction models, JIT models require a large amount of training data, which is not available when projects are in initial development phases. To address this flaw in traditional defect prediction, prior work has proposed cross-project models, i.e., models learned from older projects with sufficient history. However, cross-project models have not yet been explored in the context of JIT prediction. Therefore, in this study, we empirically evaluate the performance of JIT cross-project models. Through a case study on 11 open source projects, we find that in a JIT cross-project context: (1) high performance within-project models rarely perform well; (2) models trained on projects that have similar correlations between predictor and dependent variables often perform well; and (3) ensemble learning techniques that leverage historical data from several other projects (e.g., voting experts) often perform well. Our findings empirically confirm that JIT cross-project models learned using other projects are a viable solution for projects with little historical data. However, JIT cross-project models perform best when the data used to learn them is carefully selected.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用跨项目模型的即时缺陷预测的实证研究

先前的研究表明，预测导致缺陷的变更，即即时(JIT)缺陷预测是传统缺陷预测技术更实用的替代方案，当设计决策在开发人员的头脑中仍然是新鲜的时候，提供即时反馈。不幸的是，与传统的缺陷预测模型类似，JIT模型需要大量的训练数据，而这些数据在项目处于初始开发阶段时是不可用的。为了解决传统缺陷预测中的这个缺陷，先前的工作已经提出了跨项目模型，即，从具有足够历史的旧项目中学习的模型。然而，跨项目模型尚未在JIT预测的背景下进行探索。因此，在本研究中，我们对JIT跨项目模型的性能进行了实证评估。通过对11个开源项目的案例研究，我们发现在JIT跨项目环境下:(1)项目内的高性能模型很少表现良好;(2)在预测变量和因变量之间具有相似相关性的项目上训练的模型通常表现良好;(3)利用来自其他几个项目(例如，投票专家)的历史数据的集成学习技术通常表现良好。我们的研究结果从经验上证实，使用其他项目学习的JIT跨项目模型是具有很少历史数据的项目的可行解决方案。然而，当仔细选择用于学习JIT的数据时，JIT跨项目模型的性能最好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

自引率

0.00%

发文量