An Empirical Analysis of Git Commit Logs for Potential Inconsistency in Code Clones

Reishi Yokomori, Katsuro Inoue
{"title":"An Empirical Analysis of Git Commit Logs for Potential Inconsistency in Code Clones","authors":"Reishi Yokomori, Katsuro Inoue","doi":"arxiv-2409.08555","DOIUrl":null,"url":null,"abstract":"Code clones are code snippets that are identical or similar to other snippets\nwithin the same or different files. They are often created through\ncopy-and-paste practices and modified during development and maintenance\nactivities. Since a pair of code clones, known as a clone pair, has a possible\nlogical coupling between them, it is expected that changes to each snippet are\nmade simultaneously (co-changed) and consistently. There is extensive research\non code clones, including studies related to the co-change of clones; however,\ndetailed analysis of commit logs for code clone pairs has been limited. In this paper, we investigate the commit logs of code snippets from clone\npairs, using the git-log command to extract changes to cloned code snippets. We\nanalyzed 45 repositories owned by the Apache Software Foundation on GitHub and\naddressed three research questions regarding commit frequency, co-change ratio,\nand commit patterns. Our findings indicate that (1) on average, clone snippets\nare changed infrequently, typically only two or three times throughout their\nlifetime, (2) the ratio of co-changes is about half of all clone changes, with\n10-20\\% of co-changed commits being concerning (potentially inconsistent), and\n(3) 35-65\\% of all clone pairs being classified as concerning clone pairs\n(potentially inconsistent clone pairs). These results suggest the need for a\nconsistent management system through the commit timeline of clones.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08555","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Code clones are code snippets that are identical or similar to other snippets within the same or different files. They are often created through copy-and-paste practices and modified during development and maintenance activities. Since a pair of code clones, known as a clone pair, has a possible logical coupling between them, it is expected that changes to each snippet are made simultaneously (co-changed) and consistently. There is extensive research on code clones, including studies related to the co-change of clones; however, detailed analysis of commit logs for code clone pairs has been limited. In this paper, we investigate the commit logs of code snippets from clone pairs, using the git-log command to extract changes to cloned code snippets. We analyzed 45 repositories owned by the Apache Software Foundation on GitHub and addressed three research questions regarding commit frequency, co-change ratio, and commit patterns. Our findings indicate that (1) on average, clone snippets are changed infrequently, typically only two or three times throughout their lifetime, (2) the ratio of co-changes is about half of all clone changes, with 10-20\% of co-changed commits being concerning (potentially inconsistent), and (3) 35-65\% of all clone pairs being classified as concerning clone pairs (potentially inconsistent clone pairs). These results suggest the need for a consistent management system through the commit timeline of clones.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
对 Git 提交日志进行实证分析,发现代码克隆中潜在的不一致性
代码克隆是指与相同或不同文件中的其他代码片段相同或相似的代码片段。它们通常通过复制粘贴的方式创建,并在开发和维护活动中进行修改。由于一对代码克隆(称为克隆对)之间可能存在逻辑耦合,因此对每个代码片段的修改应同时进行(共同修改)并保持一致。有关代码克隆的研究非常广泛,其中包括与克隆的共同变更相关的研究;但是,对代码克隆对的提交日志进行详细分析的研究还很有限。在本文中,我们使用 git-log 命令提取克隆代码片段的变更,研究了克隆对中代码片段的提交日志。我们分析了 GitHub 上阿帕奇软件基金会(Apache Software Foundation)拥有的 45 个版本库,并探讨了有关提交频率、共变比率和提交模式的三个研究问题。我们的研究结果表明:(1) 克隆代码片段的平均变更频率很低,通常在其整个生命周期内只变更两到三次;(2) 共同变更的比例约为所有克隆变更的一半,其中 10-20% 的共同变更提交为相关提交(潜在不一致提交);(3) 35-65% 的克隆对被归类为相关克隆对(潜在不一致克隆对)。这些结果表明,需要一个贯穿克隆提交时间线的一致性管理系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Promise and Peril of Collaborative Code Generation Models: Balancing Effectiveness and Memorization Shannon Entropy is better Feature than Category and Sentiment in User Feedback Processing Motivations, Challenges, Best Practices, and Benefits for Bots and Conversational Agents in Software Engineering: A Multivocal Literature Review A Taxonomy of Self-Admitted Technical Debt in Deep Learning Systems Investigating team maturity in an agile automotive reorganization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1