How do Trivial Refactorings Affect Classification Prediction Models?

Darwin Pinheiro, C. Bezerra, Anderson G. Uchôa
{"title":"How do Trivial Refactorings Affect Classification Prediction Models?","authors":"Darwin Pinheiro, C. Bezerra, Anderson G. Uchôa","doi":"10.1145/3559712.3559720","DOIUrl":null,"url":null,"abstract":"Refactoring is defined as a transformation that changes the internal structure of the source code without changing the external behavior. Keeping the external behavior means that after applying the refactoring activity, the software must produce the same output as before the activity. The refactoring activity can bring several benefits, such as: removing code with low structural quality, avoiding or reducing technical debt, improving code maintainability, reuse or readability. In this way, the benefits extend to internal and external quality attributes. The literature on software refactoring suggests carrying out studies that invest in improving automated solutions for detecting and correcting refactoring. Furthermore, few studies investigate the influence that a less complex type of refactoring can have on predicting more complex refactorings. This paper investigates how less complex (trivial) refactorings affect the prediction of more complex (non-trivial) refactorings. To do this, we classify refactorings based on their triviality, extract metrics from the code, contextualize the data and train machine learning algorithms to investigate the effect caused. Our results suggest that: (i) machine learning with tree-based models (Random Forest and Decision Tree) performed very well when trained with code metrics to detect refactorings; (ii) separating trivial from non-trivial refactorings into different classes resulted in a more efficient model, indicative of improving the accuracy of automated solutions based on machine learning; and, (iii) using balancing techniques that increase or decrease samples randomly is not the best strategy to improve datasets composed of code metrics.","PeriodicalId":119656,"journal":{"name":"Proceedings of the 16th Brazilian Symposium on Software Components, Architectures, and Reuse","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th Brazilian Symposium on Software Components, Architectures, and Reuse","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3559712.3559720","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Refactoring is defined as a transformation that changes the internal structure of the source code without changing the external behavior. Keeping the external behavior means that after applying the refactoring activity, the software must produce the same output as before the activity. The refactoring activity can bring several benefits, such as: removing code with low structural quality, avoiding or reducing technical debt, improving code maintainability, reuse or readability. In this way, the benefits extend to internal and external quality attributes. The literature on software refactoring suggests carrying out studies that invest in improving automated solutions for detecting and correcting refactoring. Furthermore, few studies investigate the influence that a less complex type of refactoring can have on predicting more complex refactorings. This paper investigates how less complex (trivial) refactorings affect the prediction of more complex (non-trivial) refactorings. To do this, we classify refactorings based on their triviality, extract metrics from the code, contextualize the data and train machine learning algorithms to investigate the effect caused. Our results suggest that: (i) machine learning with tree-based models (Random Forest and Decision Tree) performed very well when trained with code metrics to detect refactorings; (ii) separating trivial from non-trivial refactorings into different classes resulted in a more efficient model, indicative of improving the accuracy of automated solutions based on machine learning; and, (iii) using balancing techniques that increase or decrease samples randomly is not the best strategy to improve datasets composed of code metrics.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
琐碎的重构如何影响分类预测模型?
重构被定义为在不改变外部行为的情况下改变源代码的内部结构的转换。保持外部行为意味着在应用重构活动之后,软件必须产生与活动之前相同的输出。重构活动可以带来一些好处,例如:删除结构质量较低的代码,避免或减少技术债务,提高代码的可维护性、重用性或可读性。通过这种方式,收益扩展到内部和外部质量属性。关于软件重构的文献建议进行研究,投资于改进用于检测和纠正重构的自动化解决方案。此外,很少有研究调查不太复杂的重构类型对预测更复杂的重构的影响。本文研究了不太复杂(琐碎)的重构如何影响对更复杂(非琐碎)重构的预测。为此,我们根据重构的琐碎程度对其进行分类,从代码中提取指标,将数据置于上下文环境中,并训练机器学习算法来调查所造成的影响。我们的研究结果表明:(i)使用基于树的模型(随机森林和决策树)的机器学习在使用代码度量进行训练以检测重构时表现非常好;(ii)将琐碎重构与非琐碎重构分离为不同的类,从而产生更有效的模型,这表明基于机器学习的自动化解决方案的准确性得到了提高;并且,(iii)使用随机增加或减少样本的平衡技术并不是改进由代码度量组成的数据集的最佳策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Analysis of Microservice Evolution using Cohesion Metrics On the Nature of Duplicate Pull Requests: An Empirical Study Using Association Rules A Field Study on Reference Architectural Decisions for Developing a UML-based Software Product Line Tool TechSpaces: Identifying and Clustering Popular Programming Technologies OfflineManager: A Lightweight Approach for Managing Offline Status in Mobile Applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1