{"title":"Towards Accurate File Tracking Based on AST Differences","authors":"Akira Fujimoto, Yoshiki Higo, S. Kusumoto","doi":"10.1109/APSEC53868.2021.00067","DOIUrl":null,"url":null,"abstract":"In the field of software development, version control systems such as Git are imperative tools that help software teams manage source code. Git can detect a change history of each file individually. Even if a file was renamed in the past, Git can identify and track the before renamed file based on content similarities, which are calculated as the ratio of lines that match pre- and post-change files to the total number of lines. However, line-based comparison techniques do not consider source code structures and have coarse granularity, which can result in misidentifying pre-change files and tracking interruptions. To resolve these problems, this paper proposes a technique that calculates file content similarities using source code differences based on an abstract syntax tree. In experiments conducted on 197 open source Java-based projects, we found that the number of rename detections increased 3.3 %, and that, on average, our technique tracked commits 1.37 times more frequently than previous technique. We also measured accuracy levels and found that the maximum F - measure was 0.943, which is higher than the 0.926 maximum value of the line-based technique.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSEC53868.2021.00067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In the field of software development, version control systems such as Git are imperative tools that help software teams manage source code. Git can detect a change history of each file individually. Even if a file was renamed in the past, Git can identify and track the before renamed file based on content similarities, which are calculated as the ratio of lines that match pre- and post-change files to the total number of lines. However, line-based comparison techniques do not consider source code structures and have coarse granularity, which can result in misidentifying pre-change files and tracking interruptions. To resolve these problems, this paper proposes a technique that calculates file content similarities using source code differences based on an abstract syntax tree. In experiments conducted on 197 open source Java-based projects, we found that the number of rename detections increased 3.3 %, and that, on average, our technique tracked commits 1.37 times more frequently than previous technique. We also measured accuracy levels and found that the maximum F - measure was 0.943, which is higher than the 0.926 maximum value of the line-based technique.