An Automated Approach to Discovering Software Refactorings by Comparing Successive Versions

IF 5.6 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING IEEE Transactions on Software Engineering Pub Date : 2025-01-27 DOI:10.1109/TSE.2025.3534239
Bo Liu;Hui Liu;Nan Niu;Yuxia Zhang;Guangjie Li;He Jiang;Yanjie Jiang
{"title":"An Automated Approach to Discovering Software Refactorings by Comparing Successive Versions","authors":"Bo Liu;Hui Liu;Nan Niu;Yuxia Zhang;Guangjie Li;He Jiang;Yanjie Jiang","doi":"10.1109/TSE.2025.3534239","DOIUrl":null,"url":null,"abstract":"Software developers and maintainers frequently conduct software refactorings to improve software quality. Identifying the conducted software refactorings may significantly facilitate the comprehension of software evolution, and thus facilitate software maintenance and evolution. Besides that, the identified refactorings are also valuable for data-driven approaches in software refactoring. To this end, researchers have proposed a few approaches to identifying software refactorings automatically. However, the performance (especially precision) of such approaches deserves substantial improvement. To this end, in this paper, we propose a novel refactoring detection approach, called <sc>ReExtractor+</small>. At the heart of <sc>ReExtractor+</small> is a reference-based entity matching algorithm that matches coarse-grained code entities (e.g., classes and methods) between two successive versions, and a context-aware statement matching algorithm that matches statements within a pair of matched methods. We evaluated <sc>ReExtractor+</small> on a benchmark consisting of 400 commits from 20 real-world projects. The evaluation results suggested that <sc>ReExtractor+</small> significantly outperformed the state of the art in refactoring detection, reducing the number of false positives by 57.4% and improving recall by 18.4%. We also evaluated the performance of the proposed matching algorithms that serve as the cornerstone of refactoring detection. The evaluation results suggested that the proposed algorithms excel in matching code entities, substantially reducing the number of mistakes (false positives plus false negatives) by 67% compared to the state-of-the-art approaches.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 5","pages":"1358-1380"},"PeriodicalIF":5.6000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10855639/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Software developers and maintainers frequently conduct software refactorings to improve software quality. Identifying the conducted software refactorings may significantly facilitate the comprehension of software evolution, and thus facilitate software maintenance and evolution. Besides that, the identified refactorings are also valuable for data-driven approaches in software refactoring. To this end, researchers have proposed a few approaches to identifying software refactorings automatically. However, the performance (especially precision) of such approaches deserves substantial improvement. To this end, in this paper, we propose a novel refactoring detection approach, called ReExtractor+. At the heart of ReExtractor+ is a reference-based entity matching algorithm that matches coarse-grained code entities (e.g., classes and methods) between two successive versions, and a context-aware statement matching algorithm that matches statements within a pair of matched methods. We evaluated ReExtractor+ on a benchmark consisting of 400 commits from 20 real-world projects. The evaluation results suggested that ReExtractor+ significantly outperformed the state of the art in refactoring detection, reducing the number of false positives by 57.4% and improving recall by 18.4%. We also evaluated the performance of the proposed matching algorithms that serve as the cornerstone of refactoring detection. The evaluation results suggested that the proposed algorithms excel in matching code entities, substantially reducing the number of mistakes (false positives plus false negatives) by 67% compared to the state-of-the-art approaches.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种通过比较连续版本发现软件重构的自动化方法
软件开发人员和维护人员经常进行软件重构以提高软件质量。识别已执行的软件重构可以显著地促进对软件演化的理解,从而促进软件的维护和演化。除此之外,确定的重构对于软件重构中的数据驱动方法也很有价值。为此,研究人员提出了几种自动识别软件重构的方法。然而,这些方法的性能(尤其是精度)还有待改进。为此,在本文中,我们提出了一种新的重构检测方法——ReExtractor+。ReExtractor+的核心是一个基于引用的实体匹配算法,它在两个连续的版本之间匹配粗粒度的代码实体(例如,类和方法),以及一个上下文感知的语句匹配算法,它在一对匹配的方法中匹配语句。我们在一个包含来自20个实际项目的400个提交的基准上评估了ReExtractor+。评估结果表明,ReExtractor+在重构检测方面的表现明显优于现有技术,将误报次数减少了57.4%,召回率提高了18.4%。我们还评估了作为重构检测基石的所提出的匹配算法的性能。评估结果表明,所提出的算法在匹配代码实体方面表现出色,与最先进的方法相比,大大减少了67%的错误数量(假阳性和假阴性)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering 工程技术-工程:电子与电气
CiteScore
9.70
自引率
10.80%
发文量
724
审稿时长
6 months
期刊介绍: IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.
期刊最新文献
Parameter-Efficient Fine-Tuning with Attributed Patch Semantic Graph for Automated Patch Correctness Assessment Large-Scale Empirical Analysis of Continuous Fuzzing: Insights from 1 Million Fuzzing Sessions Automated Repair of Alloy Specifications in the Era of Large Language Models Mutation-Guided Unit Test Generation with a Large Language Model Self-Admitted GenAI Usage in Open-Source Software
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1