AI-based clustering of similar issues in GitHub’s repositories

IF 1.8 3区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Journal of Computer Languages Pub Date : 2024-03-01 Epub Date: 2024-01-04 DOI:10.1016/j.cola.2023.101257

Hamzeh Eyal Salman

{"title":"AI-based clustering of similar issues in GitHub’s repositories","authors":"Hamzeh Eyal Salman","doi":"10.1016/j.cola.2023.101257","DOIUrl":null,"url":null,"abstract":"<div><p>Issues are highly prevalent on GitHub due to the increasing scale of its software repositories. These issues are submitted to the issue tracking system for several reasons: reporting a bug, asking a question, or other maintenance activities. The attractive repositories on Github receive a large number of issues daily. Assigning similar issues individually to different developers for validating and fixing introduces inconsistencies when asynchronously independent developers fix them, in addition to slowing the fixing process. However, grouping similar issues into clusters and assigning each cluster to the same and appropriate developer/team speeds up the fixing process. In this paper, a machine learning algorithm-based approach has been proposed to support issue management on GitHub by grouping similar issues together. For validity, the proposed approach was applied to 13 software components from different and large repositories. Findings reveal that the proposed approach identifies similar clusters of issues with promising results using widely used evaluation measures in this subject: Precision, Recall, and F-measure.</p></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"78 ","pages":"Article 101257"},"PeriodicalIF":1.8000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Languages","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590118423000679","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/4 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Issues are highly prevalent on GitHub due to the increasing scale of its software repositories. These issues are submitted to the issue tracking system for several reasons: reporting a bug, asking a question, or other maintenance activities. The attractive repositories on Github receive a large number of issues daily. Assigning similar issues individually to different developers for validating and fixing introduces inconsistencies when asynchronously independent developers fix them, in addition to slowing the fixing process. However, grouping similar issues into clusters and assigning each cluster to the same and appropriate developer/team speeds up the fixing process. In this paper, a machine learning algorithm-based approach has been proposed to support issue management on GitHub by grouping similar issues together. For validity, the proposed approach was applied to 13 software components from different and large repositories. Findings reveal that the proposed approach identifies similar clusters of issues with promising results using widely used evaluation measures in this subject: Precision, Recall, and F-measure.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于人工智能的 GitHub 仓库类似问题聚类

由于 GitHub 软件仓库的规模不断扩大，问题在 GitHub 上非常普遍。向问题跟踪系统提交这些问题有几个原因：报告错误、提出问题或其他维护活动。Github 上极具吸引力的软件源每天都会收到大量问题。将类似的问题单独分配给不同的开发人员进行验证和修复，除了会减慢修复进程外，还会在独立开发人员异步修复问题时引入不一致性。然而，将类似问题分组并将每个分组分配给相同且合适的开发人员/团队，可以加快修复过程。本文提出了一种基于机器学习算法的方法，通过将类似问题分组来支持 GitHub 上的问题管理。为了验证该方法的有效性，我们将其应用于来自不同大型软件库的 13 个软件组件。研究结果表明，所提出的方法能识别出类似的问题群组，并在该领域广泛使用的评估指标中取得了良好的结果：精确度、召回率和 F-测度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Computer Languages Computer Science-Computer Networks and Communications

CiteScore

5.00

自引率

13.60%

发文量