Local and Global Explainability for Technical Debt Identification

IF 5.6 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING IEEE Transactions on Software Engineering Pub Date : 2024-07-04 DOI:10.1109/TSE.2024.3422427

Dimitrios Tsoukalas;Nikolaos Mittas;Elvira-Maria Arvanitou;Apostolos Ampatzoglou;Alexander Chatzigeorgiou;Dionysios Kehagias

{"title":"Local and Global Explainability for Technical Debt Identification","authors":"Dimitrios Tsoukalas;Nikolaos Mittas;Elvira-Maria Arvanitou;Apostolos Ampatzoglou;Alexander Chatzigeorgiou;Dionysios Kehagias","doi":"10.1109/TSE.2024.3422427","DOIUrl":null,"url":null,"abstract":"In recent years, we have witnessed an important increase in research focusing on how machine learning (ML) techniques can be used for software quality assessment and improvement. However, the derived methodologies and tools lack transparency, due to the black-box nature of the employed machine learning models, leading to decreased trust in their results. To address this shortcoming, in this paper we extend the state-of-the-art and -practice by building explainable AI models on top of machine learning ones, to interpret the factors (i.e. software metrics) that constitute a module as in risk of having high technical debt (HIGH TD), to obtain thresholds for metric scores that are alerting for poor maintainability, and finally, we dig further to achieve local interpretation that explains the specific problems of each module, pinpointing to specific opportunities for improvement during TD management. To achieve this goal, we have developed project-specific classifiers (characterizing modules as HIGH and NOT-HIGH TD) for 21 open-source projects, and we explain their rationale using the SHapley Additive exPlanation (SHAP) analysis. Based on our analysis, complexity, comments ratio, cohesion, nesting of control flow statements, coupling, refactoring activity, and code churn are the most important reasons for characterizing classes as in HIGH TD risk. The analysis is complemented with global and local means of interpretation, such as metric thresholds and case-by-case reasoning for characterizing a class as in-risk of having HIGH TD. The results of the study are compared against the state-of-the-art and are interpreted from the point of view of both researchers and practitioners.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 8","pages":"2110-2123"},"PeriodicalIF":5.6000,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10586898/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, we have witnessed an important increase in research focusing on how machine learning (ML) techniques can be used for software quality assessment and improvement. However, the derived methodologies and tools lack transparency, due to the black-box nature of the employed machine learning models, leading to decreased trust in their results. To address this shortcoming, in this paper we extend the state-of-the-art and -practice by building explainable AI models on top of machine learning ones, to interpret the factors (i.e. software metrics) that constitute a module as in risk of having high technical debt (HIGH TD), to obtain thresholds for metric scores that are alerting for poor maintainability, and finally, we dig further to achieve local interpretation that explains the specific problems of each module, pinpointing to specific opportunities for improvement during TD management. To achieve this goal, we have developed project-specific classifiers (characterizing modules as HIGH and NOT-HIGH TD) for 21 open-source projects, and we explain their rationale using the SHapley Additive exPlanation (SHAP) analysis. Based on our analysis, complexity, comments ratio, cohesion, nesting of control flow statements, coupling, refactoring activity, and code churn are the most important reasons for characterizing classes as in HIGH TD risk. The analysis is complemented with global and local means of interpretation, such as metric thresholds and case-by-case reasoning for characterizing a class as in-risk of having HIGH TD. The results of the study are compared against the state-of-the-art and are interpreted from the point of view of both researchers and practitioners.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

识别技术债务的局部和全局可解释性

近年来，关于如何将机器学习（ML）技术用于软件质量评估和改进的研究大幅增加。然而，由于所使用的机器学习模型的黑箱性质，衍生出的方法和工具缺乏透明度，导致对其结果的信任度降低。为了解决这一缺陷，我们在本文中通过在机器学习模型的基础上建立可解释的人工智能模型，对构成模块具有高技术债务（HIGH TD）风险的因素（即软件度量指标）进行解释，以获得对可维护性差发出警报的度量指标得分阈值，最后，我们进一步挖掘以实现局部解释，解释每个模块的具体问题，在 TD 管理过程中指出具体的改进机会。为了实现这一目标，我们为 21 个开源项目开发了针对项目的分类器（将模块定性为高 TD 和非高 TD），并使用 SHapley Additive exPlanation（SHAP）分析法解释了分类器的原理。根据我们的分析，复杂性、注释比例、内聚性、控制流语句嵌套、耦合、重构活动和代码流失是将类定性为高 TD 风险的最重要原因。该分析还辅以全局和局部的解释手段，如度量阈值和逐案推理，以确定某个类是否具有高 TD 风险。研究结果与最先进的方法进行了比较，并从研究人员和从业人员的角度进行了解释。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Software Engineering 工程技术-工程：电子与电气

CiteScore

9.70

自引率

10.80%

发文量

724

审稿时长

6 months

期刊介绍： IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.