{"title":"An Empirical Analysis of Git Commit Logs for Potential Inconsistency in Code Clones","authors":"Reishi Yokomori, Katsuro Inoue","doi":"arxiv-2409.08555","DOIUrl":null,"url":null,"abstract":"Code clones are code snippets that are identical or similar to other snippets\nwithin the same or different files. They are often created through\ncopy-and-paste practices and modified during development and maintenance\nactivities. Since a pair of code clones, known as a clone pair, has a possible\nlogical coupling between them, it is expected that changes to each snippet are\nmade simultaneously (co-changed) and consistently. There is extensive research\non code clones, including studies related to the co-change of clones; however,\ndetailed analysis of commit logs for code clone pairs has been limited. In this paper, we investigate the commit logs of code snippets from clone\npairs, using the git-log command to extract changes to cloned code snippets. We\nanalyzed 45 repositories owned by the Apache Software Foundation on GitHub and\naddressed three research questions regarding commit frequency, co-change ratio,\nand commit patterns. Our findings indicate that (1) on average, clone snippets\nare changed infrequently, typically only two or three times throughout their\nlifetime, (2) the ratio of co-changes is about half of all clone changes, with\n10-20\\% of co-changed commits being concerning (potentially inconsistent), and\n(3) 35-65\\% of all clone pairs being classified as concerning clone pairs\n(potentially inconsistent clone pairs). These results suggest the need for a\nconsistent management system through the commit timeline of clones.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08555","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Code clones are code snippets that are identical or similar to other snippets
within the same or different files. They are often created through
copy-and-paste practices and modified during development and maintenance
activities. Since a pair of code clones, known as a clone pair, has a possible
logical coupling between them, it is expected that changes to each snippet are
made simultaneously (co-changed) and consistently. There is extensive research
on code clones, including studies related to the co-change of clones; however,
detailed analysis of commit logs for code clone pairs has been limited. In this paper, we investigate the commit logs of code snippets from clone
pairs, using the git-log command to extract changes to cloned code snippets. We
analyzed 45 repositories owned by the Apache Software Foundation on GitHub and
addressed three research questions regarding commit frequency, co-change ratio,
and commit patterns. Our findings indicate that (1) on average, clone snippets
are changed infrequently, typically only two or three times throughout their
lifetime, (2) the ratio of co-changes is about half of all clone changes, with
10-20\% of co-changed commits being concerning (potentially inconsistent), and
(3) 35-65\% of all clone pairs being classified as concerning clone pairs
(potentially inconsistent clone pairs). These results suggest the need for a
consistent management system through the commit timeline of clones.