Cold-Start Software Analytics

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR) Pub Date : 2016-05-14 DOI:10.1145/2901739.2901740

Jin Guo, Mona Rahimi, J. Cleland-Huang, A. Rasin, J. Hayes, Michael Vierhauser

{"title":"Cold-Start Software Analytics","authors":"Jin Guo, Mona Rahimi, J. Cleland-Huang, A. Rasin, J. Hayes, Michael Vierhauser","doi":"10.1145/2901739.2901740","DOIUrl":null,"url":null,"abstract":"Software project artifacts such as source code, requirements, and change logs represent a gold-mine of actionable information. As a result, software analytic solutions have been developed to mine repositories and answer questions such as \"who is the expert?,'' \"which classes are fault prone?,'' or even \"who are the domain experts for these fault-prone classes?'' Analytics often require training and configuring in order to maximize performance within the context of each project. A cold-start problem exists when a function is applied within a project context without first configuring the analytic functions on project-specific data. This scenario exists because of the non-trivial effort necessary to instrument a project environment with candidate tools and algorithms and to empirically evaluate alternate configurations. We address the cold-start problem by comparatively evaluating `best-of-breed' and `profile-driven' solutions, both of which reuse known configurations in new project contexts. We describe and evaluate our approach against 20 project datasets for the three analytic areas of artifact connectivity, fault-prediction, and finding the expert, and show that the best-of-breed approach outperformed the profile-driven approach in all three areas; however, while it delivered acceptable results for artifact connectivity and find the expert, both techniques underperformed for cold-start fault prediction.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"8 1","pages":"142-153"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2901739.2901740","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 25

Abstract

Software project artifacts such as source code, requirements, and change logs represent a gold-mine of actionable information. As a result, software analytic solutions have been developed to mine repositories and answer questions such as "who is the expert?,'' "which classes are fault prone?,'' or even "who are the domain experts for these fault-prone classes?'' Analytics often require training and configuring in order to maximize performance within the context of each project. A cold-start problem exists when a function is applied within a project context without first configuring the analytic functions on project-specific data. This scenario exists because of the non-trivial effort necessary to instrument a project environment with candidate tools and algorithms and to empirically evaluate alternate configurations. We address the cold-start problem by comparatively evaluating `best-of-breed' and `profile-driven' solutions, both of which reuse known configurations in new project contexts. We describe and evaluate our approach against 20 project datasets for the three analytic areas of artifact connectivity, fault-prediction, and finding the expert, and show that the best-of-breed approach outperformed the profile-driven approach in all three areas; however, while it delivered acceptable results for artifact connectivity and find the expert, both techniques underperformed for cold-start fault prediction.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

冷启动软件分析

软件项目工件，如源代码、需求和变更日志，代表了可操作信息的金矿。因此，开发了软件分析解决方案来挖掘存储库并回答诸如“谁是专家?”，“哪些类别容易发生故障?”，甚至“谁是这些容易出错类的领域专家?”“分析通常需要培训和配置，以便在每个项目的上下文中最大化性能。当在项目上下文中应用函数而没有首先在项目特定数据上配置分析函数时，就会存在冷启动问题。这种情况之所以存在，是因为使用候选工具和算法对项目环境进行仪表化以及经验地评估备选配置所必需的重要工作。我们通过比较评估“同类最佳”和“配置文件驱动”的解决方案来解决冷启动问题，这两种解决方案都在新的项目环境中重用已知的配置。我们针对工件连接性、故障预测和寻找专家这三个分析领域的20个项目数据集描述和评估了我们的方法，并表明在所有三个领域中，最佳的方法都优于概要驱动的方法;然而，尽管它为工件连接性和寻找专家提供了可接受的结果，但这两种技术在冷启动故障预测方面表现不佳。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

自引率

0.00%

发文量