Hao Ren, Yanhui Li, Lin Chen, Yulu Cao, Xiaowei Zhang, Changhai Nie
{"title":"Just-in-time identification for cross-project correlated issues","authors":"Hao Ren, Yanhui Li, Lin Chen, Yulu Cao, Xiaowei Zhang, Changhai Nie","doi":"10.1002/smr.2637","DOIUrl":null,"url":null,"abstract":"<p>Issue tracking systems are now prevalent in software development, which would help developers submit and discuss issues to solve development problems on software projects. Most previous studies have been conducted to analyze issue relations within projects, such as recommending similar or duplicate bug issues. However, along with the popularization of co-developing through multiple projects, many issues are cross-project correlated (CPC), that is, one issue is associated with another issue in a different project. When developers meet with CPC issues, it may primarily increase the difficulties of solving them because they need information from not only their projects but also other related projects that developers are not familiar with. Identifying a CPC issue as early as possible is a fundamental challenge for both managers and developers to allocate the resources for software maintenance and estimate the effort to solve it. This paper proposes 11 issue metrics of two groups to describe textual summary and reporters' activity, which can be extracted just after the issue was reported. We employ these 11 issue metrics to construct just-in-time (JIT) prediction models to identify CPC issues. To evaluate the effect of CPC issue prediction models, we conduct experiments on 16 open-source data science and deep learning projects and compare our prediction model with two baseline models based on textual features (i.e., Term Frequency-Inverse Document Frequency [TF-IDF] and Word Embedding), which are commonly adopted by previous studies on issue prediction. The results show that the JIT prediction model based on issue metrics has significantly improved the performance of CPC issue prediction under two evaluation indicators, Matthew's correlation coefficient (MCC) and F1. In addition, we find that the prediction model is more suitable for large-scale complex core projects in the open-source ecosystem.</p>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"36 7","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2023-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Software-Evolution and Process","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/smr.2637","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Issue tracking systems are now prevalent in software development, which would help developers submit and discuss issues to solve development problems on software projects. Most previous studies have been conducted to analyze issue relations within projects, such as recommending similar or duplicate bug issues. However, along with the popularization of co-developing through multiple projects, many issues are cross-project correlated (CPC), that is, one issue is associated with another issue in a different project. When developers meet with CPC issues, it may primarily increase the difficulties of solving them because they need information from not only their projects but also other related projects that developers are not familiar with. Identifying a CPC issue as early as possible is a fundamental challenge for both managers and developers to allocate the resources for software maintenance and estimate the effort to solve it. This paper proposes 11 issue metrics of two groups to describe textual summary and reporters' activity, which can be extracted just after the issue was reported. We employ these 11 issue metrics to construct just-in-time (JIT) prediction models to identify CPC issues. To evaluate the effect of CPC issue prediction models, we conduct experiments on 16 open-source data science and deep learning projects and compare our prediction model with two baseline models based on textual features (i.e., Term Frequency-Inverse Document Frequency [TF-IDF] and Word Embedding), which are commonly adopted by previous studies on issue prediction. The results show that the JIT prediction model based on issue metrics has significantly improved the performance of CPC issue prediction under two evaluation indicators, Matthew's correlation coefficient (MCC) and F1. In addition, we find that the prediction model is more suitable for large-scale complex core projects in the open-source ecosystem.