AI-based clustering of similar issues in GitHub’s repositories

IF 1.7 3区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Journal of Computer Languages Pub Date : 2024-01-04 DOI:10.1016/j.cola.2023.101257
Hamzeh Eyal Salman
{"title":"AI-based clustering of similar issues in GitHub’s repositories","authors":"Hamzeh Eyal Salman","doi":"10.1016/j.cola.2023.101257","DOIUrl":null,"url":null,"abstract":"<div><p>Issues are highly prevalent on GitHub due to the increasing scale of its software repositories. These issues are submitted to the issue tracking system for several reasons: reporting a bug, asking a question, or other maintenance activities. The attractive repositories on Github receive a large number of issues daily. Assigning similar issues individually to different developers for validating and fixing introduces inconsistencies when asynchronously independent developers fix them, in addition to slowing the fixing process. However, grouping similar issues into clusters and assigning each cluster to the same and appropriate developer/team speeds up the fixing process. In this paper, a machine learning algorithm-based approach has been proposed to support issue management on GitHub by grouping similar issues together. For validity, the proposed approach was applied to 13 software components from different and large repositories. Findings reveal that the proposed approach identifies similar clusters of issues with promising results using widely used evaluation measures in this subject: Precision, Recall, and F-measure.</p></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"78 ","pages":"Article 101257"},"PeriodicalIF":1.7000,"publicationDate":"2024-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Languages","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590118423000679","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Issues are highly prevalent on GitHub due to the increasing scale of its software repositories. These issues are submitted to the issue tracking system for several reasons: reporting a bug, asking a question, or other maintenance activities. The attractive repositories on Github receive a large number of issues daily. Assigning similar issues individually to different developers for validating and fixing introduces inconsistencies when asynchronously independent developers fix them, in addition to slowing the fixing process. However, grouping similar issues into clusters and assigning each cluster to the same and appropriate developer/team speeds up the fixing process. In this paper, a machine learning algorithm-based approach has been proposed to support issue management on GitHub by grouping similar issues together. For validity, the proposed approach was applied to 13 software components from different and large repositories. Findings reveal that the proposed approach identifies similar clusters of issues with promising results using widely used evaluation measures in this subject: Precision, Recall, and F-measure.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于人工智能的 GitHub 仓库类似问题聚类
由于 GitHub 软件仓库的规模不断扩大,问题在 GitHub 上非常普遍。向问题跟踪系统提交这些问题有几个原因:报告错误、提出问题或其他维护活动。Github 上极具吸引力的软件源每天都会收到大量问题。将类似的问题单独分配给不同的开发人员进行验证和修复,除了会减慢修复进程外,还会在独立开发人员异步修复问题时引入不一致性。然而,将类似问题分组并将每个分组分配给相同且合适的开发人员/团队,可以加快修复过程。本文提出了一种基于机器学习算法的方法,通过将类似问题分组来支持 GitHub 上的问题管理。为了验证该方法的有效性,我们将其应用于来自不同大型软件库的 13 个软件组件。研究结果表明,所提出的方法能识别出类似的问题群组,并在该领域广泛使用的评估指标中取得了良好的结果:精确度、召回率和 F-测度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Computer Languages
Journal of Computer Languages Computer Science-Computer Networks and Communications
CiteScore
5.00
自引率
13.60%
发文量
36
期刊最新文献
Combining type inference techniques for semi-automatic UML generation from Pharo code Editorial Board An efficient instance selection algorithm for fast training of support vector machine for cross-project software defect prediction pairs Detection and treatment of string events in the limit ClangOz: Parallel constant evaluation of C++ map and reduce operations
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1