Context:
Most research in software clustering and remodularisation typically concludes by recommending the refactoring operations without further insight into the practicality of the proposed technique. Developers might be hesitant to follow through with the refactoring suggestions due to the uncertainty in the effort needed.
Objective:
This work aims to address this gap by introducing an effoRt Estimation AppRoach foR softwAre clusteriNG-based rEmodularisation (REARRANGE) to close the loop in extant software clustering and remodularisation research by estimating the time required to carry out the suggested refactoring operations based on the history of the evolution of the software. By providing tangible estimates of refactoring effort in person-hours, we can inform developers of complex and time-consuming refactoring operations that will help prioritise refactoring efforts, allowing practitioners to weave in these activities during sprint planning.
Method:
REARRANGE builds a machine learning model to predict effort estimation based on past commit activity which extracts Software Features (lines of code, number of methods), Refactoring Features (refactoring type, source and destination) and Dependency Features (dependencies between classes). REARRANGE is then compared against sanity checks, baseline effort estimation models, and state-of-the-art software estimation models. We also attempt to cross-validate REARRANGE’s effort estimation with software developers.
Results:
Experimented through 25 open-source Java-based projects, the proposed approach estimated the refactoring effort of the test subjects with a Mean Absolute Error (MAE) of 5.47 person-hours against the MAE of the next-best approach of 453.31 person-hours. Based on a survey conducted among software developers, REARRANGE consistently delivers accurate estimates in 93.6% of cases.
Conclusion:
The lack of a direct comparison for REARRANGE highlights the need for a refactoring effort-focused estimation model that provides tangible effort estimates in person-hours for refactoring operations. Only then can developers selectively choose relevant refactoring operations while considering the available time and budget constraints, bridging the gap between software clustering research and real-world application.