An Automated Approach to Discovering Software Refactorings by Comparing Successive Versions

IF 5.6 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING IEEE Transactions on Software Engineering Pub Date : 2025-01-27 DOI:10.1109/TSE.2025.3534239

Bo Liu;Hui Liu;Nan Niu;Yuxia Zhang;Guangjie Li;He Jiang;Yanjie Jiang

{"title":"An Automated Approach to Discovering Software Refactorings by Comparing Successive Versions","authors":"Bo Liu;Hui Liu;Nan Niu;Yuxia Zhang;Guangjie Li;He Jiang;Yanjie Jiang","doi":"10.1109/TSE.2025.3534239","DOIUrl":null,"url":null,"abstract":"Software developers and maintainers frequently conduct software refactorings to improve software quality. Identifying the conducted software refactorings may significantly facilitate the comprehension of software evolution, and thus facilitate software maintenance and evolution. Besides that, the identified refactorings are also valuable for data-driven approaches in software refactoring. To this end, researchers have proposed a few approaches to identifying software refactorings automatically. However, the performance (especially precision) of such approaches deserves substantial improvement. To this end, in this paper, we propose a novel refactoring detection approach, called <sc>ReExtractor+</small>. At the heart of <sc>ReExtractor+</small> is a reference-based entity matching algorithm that matches coarse-grained code entities (e.g., classes and methods) between two successive versions, and a context-aware statement matching algorithm that matches statements within a pair of matched methods. We evaluated <sc>ReExtractor+</small> on a benchmark consisting of 400 commits from 20 real-world projects. The evaluation results suggested that <sc>ReExtractor+</small> significantly outperformed the state of the art in refactoring detection, reducing the number of false positives by 57.4% and improving recall by 18.4%. We also evaluated the performance of the proposed matching algorithms that serve as the cornerstone of refactoring detection. The evaluation results suggested that the proposed algorithms excel in matching code entities, substantially reducing the number of mistakes (false positives plus false negatives) by 67% compared to the state-of-the-art approaches.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 5","pages":"1358-1380"},"PeriodicalIF":5.6000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10855639/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Software developers and maintainers frequently conduct software refactorings to improve software quality. Identifying the conducted software refactorings may significantly facilitate the comprehension of software evolution, and thus facilitate software maintenance and evolution. Besides that, the identified refactorings are also valuable for data-driven approaches in software refactoring. To this end, researchers have proposed a few approaches to identifying software refactorings automatically. However, the performance (especially precision) of such approaches deserves substantial improvement. To this end, in this paper, we propose a novel refactoring detection approach, called ReExtractor+. At the heart of ReExtractor+ is a reference-based entity matching algorithm that matches coarse-grained code entities (e.g., classes and methods) between two successive versions, and a context-aware statement matching algorithm that matches statements within a pair of matched methods. We evaluated ReExtractor+ on a benchmark consisting of 400 commits from 20 real-world projects. The evaluation results suggested that ReExtractor+ significantly outperformed the state of the art in refactoring detection, reducing the number of false positives by 57.4% and improving recall by 18.4%. We also evaluated the performance of the proposed matching algorithms that serve as the cornerstone of refactoring detection. The evaluation results suggested that the proposed algorithms excel in matching code entities, substantially reducing the number of mistakes (false positives plus false negatives) by 67% compared to the state-of-the-art approaches.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

一种通过比较连续版本发现软件重构的自动化方法

软件开发人员和维护人员经常进行软件重构以提高软件质量。识别已执行的软件重构可以显著地促进对软件演化的理解，从而促进软件的维护和演化。除此之外，确定的重构对于软件重构中的数据驱动方法也很有价值。为此，研究人员提出了几种自动识别软件重构的方法。然而，这些方法的性能（尤其是精度）还有待改进。为此，在本文中，我们提出了一种新的重构检测方法——ReExtractor+。ReExtractor+的核心是一个基于引用的实体匹配算法，它在两个连续的版本之间匹配粗粒度的代码实体（例如，类和方法），以及一个上下文感知的语句匹配算法，它在一对匹配的方法中匹配语句。我们在一个包含来自20个实际项目的400个提交的基准上评估了ReExtractor+。评估结果表明，ReExtractor+在重构检测方面的表现明显优于现有技术，将误报次数减少了57.4%，召回率提高了18.4%。我们还评估了作为重构检测基石的所提出的匹配算法的性能。评估结果表明，所提出的算法在匹配代码实体方面表现出色，与最先进的方法相比，大大减少了67%的错误数量（假阳性和假阴性）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Software Engineering 工程技术-工程：电子与电气

CiteScore

9.70

自引率

10.80%

发文量

724

审稿时长

6 months

期刊介绍： IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.