Bo Liu;Hui Liu;Nan Niu;Yuxia Zhang;Guangjie Li;He Jiang;Yanjie Jiang
{"title":"An Automated Approach to Discovering Software Refactorings by Comparing Successive Versions","authors":"Bo Liu;Hui Liu;Nan Niu;Yuxia Zhang;Guangjie Li;He Jiang;Yanjie Jiang","doi":"10.1109/TSE.2025.3534239","DOIUrl":null,"url":null,"abstract":"Software developers and maintainers frequently conduct software refactorings to improve software quality. Identifying the conducted software refactorings may significantly facilitate the comprehension of software evolution, and thus facilitate software maintenance and evolution. Besides that, the identified refactorings are also valuable for data-driven approaches in software refactoring. To this end, researchers have proposed a few approaches to identifying software refactorings automatically. However, the performance (especially precision) of such approaches deserves substantial improvement. To this end, in this paper, we propose a novel refactoring detection approach, called <sc>ReExtractor+</small>. At the heart of <sc>ReExtractor+</small> is a reference-based entity matching algorithm that matches coarse-grained code entities (e.g., classes and methods) between two successive versions, and a context-aware statement matching algorithm that matches statements within a pair of matched methods. We evaluated <sc>ReExtractor+</small> on a benchmark consisting of 400 commits from 20 real-world projects. The evaluation results suggested that <sc>ReExtractor+</small> significantly outperformed the state of the art in refactoring detection, reducing the number of false positives by 57.4% and improving recall by 18.4%. We also evaluated the performance of the proposed matching algorithms that serve as the cornerstone of refactoring detection. The evaluation results suggested that the proposed algorithms excel in matching code entities, substantially reducing the number of mistakes (false positives plus false negatives) by 67% compared to the state-of-the-art approaches.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 5","pages":"1358-1380"},"PeriodicalIF":5.6000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10855639/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Software developers and maintainers frequently conduct software refactorings to improve software quality. Identifying the conducted software refactorings may significantly facilitate the comprehension of software evolution, and thus facilitate software maintenance and evolution. Besides that, the identified refactorings are also valuable for data-driven approaches in software refactoring. To this end, researchers have proposed a few approaches to identifying software refactorings automatically. However, the performance (especially precision) of such approaches deserves substantial improvement. To this end, in this paper, we propose a novel refactoring detection approach, called ReExtractor+. At the heart of ReExtractor+ is a reference-based entity matching algorithm that matches coarse-grained code entities (e.g., classes and methods) between two successive versions, and a context-aware statement matching algorithm that matches statements within a pair of matched methods. We evaluated ReExtractor+ on a benchmark consisting of 400 commits from 20 real-world projects. The evaluation results suggested that ReExtractor+ significantly outperformed the state of the art in refactoring detection, reducing the number of false positives by 57.4% and improving recall by 18.4%. We also evaluated the performance of the proposed matching algorithms that serve as the cornerstone of refactoring detection. The evaluation results suggested that the proposed algorithms excel in matching code entities, substantially reducing the number of mistakes (false positives plus false negatives) by 67% compared to the state-of-the-art approaches.
期刊介绍:
IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include:
a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models.
b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects.
c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards.
d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues.
e) System issues: Hardware-software trade-offs.
f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.