基于AST差异的精确文件跟踪

Akira Fujimoto, Yoshiki Higo, S. Kusumoto
{"title":"基于AST差异的精确文件跟踪","authors":"Akira Fujimoto, Yoshiki Higo, S. Kusumoto","doi":"10.1109/APSEC53868.2021.00067","DOIUrl":null,"url":null,"abstract":"In the field of software development, version control systems such as Git are imperative tools that help software teams manage source code. Git can detect a change history of each file individually. Even if a file was renamed in the past, Git can identify and track the before renamed file based on content similarities, which are calculated as the ratio of lines that match pre- and post-change files to the total number of lines. However, line-based comparison techniques do not consider source code structures and have coarse granularity, which can result in misidentifying pre-change files and tracking interruptions. To resolve these problems, this paper proposes a technique that calculates file content similarities using source code differences based on an abstract syntax tree. In experiments conducted on 197 open source Java-based projects, we found that the number of rename detections increased 3.3 %, and that, on average, our technique tracked commits 1.37 times more frequently than previous technique. We also measured accuracy levels and found that the maximum F - measure was 0.943, which is higher than the 0.926 maximum value of the line-based technique.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards Accurate File Tracking Based on AST Differences\",\"authors\":\"Akira Fujimoto, Yoshiki Higo, S. Kusumoto\",\"doi\":\"10.1109/APSEC53868.2021.00067\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the field of software development, version control systems such as Git are imperative tools that help software teams manage source code. Git can detect a change history of each file individually. Even if a file was renamed in the past, Git can identify and track the before renamed file based on content similarities, which are calculated as the ratio of lines that match pre- and post-change files to the total number of lines. However, line-based comparison techniques do not consider source code structures and have coarse granularity, which can result in misidentifying pre-change files and tracking interruptions. To resolve these problems, this paper proposes a technique that calculates file content similarities using source code differences based on an abstract syntax tree. In experiments conducted on 197 open source Java-based projects, we found that the number of rename detections increased 3.3 %, and that, on average, our technique tracked commits 1.37 times more frequently than previous technique. We also measured accuracy levels and found that the maximum F - measure was 0.943, which is higher than the 0.926 maximum value of the line-based technique.\",\"PeriodicalId\":143800,\"journal\":{\"name\":\"2021 28th Asia-Pacific Software Engineering Conference (APSEC)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 28th Asia-Pacific Software Engineering Conference (APSEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APSEC53868.2021.00067\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSEC53868.2021.00067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在软件开发领域,像Git这样的版本控制系统是帮助软件团队管理源代码的必备工具。Git可以单独检测每个文件的变更历史。即使文件在过去被重命名过,Git也可以根据内容相似度来识别和跟踪重命名之前的文件,内容相似度是根据修改前和修改后文件匹配的行数与总行数的比值计算出来的。然而,基于行的比较技术不考虑源代码结构,并且具有粗粒度,这可能导致错误地识别预更改文件和跟踪中断。为了解决这些问题,本文提出了一种基于抽象语法树的源代码差异计算文件内容相似度的技术。在对197个基于java的开源项目进行的实验中,我们发现重命名检测的数量增加了3.3%,并且,平均而言,我们的技术跟踪提交的频率比以前的技术高1.37倍。我们还测量了精度水平,发现F -测度的最大值为0.943,高于基于线的技术的最大值0.926。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Towards Accurate File Tracking Based on AST Differences
In the field of software development, version control systems such as Git are imperative tools that help software teams manage source code. Git can detect a change history of each file individually. Even if a file was renamed in the past, Git can identify and track the before renamed file based on content similarities, which are calculated as the ratio of lines that match pre- and post-change files to the total number of lines. However, line-based comparison techniques do not consider source code structures and have coarse granularity, which can result in misidentifying pre-change files and tracking interruptions. To resolve these problems, this paper proposes a technique that calculates file content similarities using source code differences based on an abstract syntax tree. In experiments conducted on 197 open source Java-based projects, we found that the number of rename detections increased 3.3 %, and that, on average, our technique tracked commits 1.37 times more frequently than previous technique. We also measured accuracy levels and found that the maximum F - measure was 0.943, which is higher than the 0.926 maximum value of the line-based technique.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Verification Assisted Gas Reduction for Smart Contracts Effective Bug Triage Based on a Hybrid Neural Network Learn To Align: A Code Alignment Network For Code Clone Detection Framework for Recommending Data Residency Compliant Application Architecture Degree doesn't Matter: Identifying the Drivers of Interaction in Software Development Ecosystems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1