2023 ACM/IEEE International Conference on Technical Debt (TechDebt)最新文献

Technical Debt Classification in Issue Trackers using Natural Language Processing based on Transformers 基于变压器的自然语言处理在问题跟踪器中的技术债务分类

2023 ACM/IEEE International Conference on Technical Debt (TechDebt)

Pub Date : 2023-05-01 DOI: 10.1109/TechDebt59074.2023.00017

Daniel Skryseth, Karthik Shivashankar, I. Pilán, A. Martini

Background: Technical Debt (TD) needs to be controlled and tracked during software development. Support to automatically track TD in issue trackers is limited. Aim: We explore the usage of a large dataset of developer-labeled TD issues in combination with cutting-edge Natural Language Processing (NLP) approaches to automatically classify TD in issue trackers. Method: We mine and analyze more than 160GB of textual data from GitHub projects, collecting over 55,600 TD issues and consolidating them into a large dataset (GTD dataset). We use such datasets to train and test Transformer ML models. Then we test the model’s generalization ability by testing them on six unseen projects. Finally, we re-train the models including part of the TD issues from the target project to test their adaptability. Results and conclusion: (i) We create and release the GTD dataset, a comprehensive dataset including TD issues from 6,401 public repositories with various contexts; (ii) By training Transformers using the GTD dataset, we achieve performance metrics that are promising; (iii) Our results are a significant step forward towards supporting the automatic classification of TD in issue trackers, especially when the models are adapted to the context of unseen projects after fine-tuning.

背景:在软件开发过程中需要对技术债(TD)进行控制和跟踪。在问题跟踪器中自动跟踪TD的支持是有限的。目的:我们探索将开发者标记的TD问题的大型数据集与前沿的自然语言处理(NLP)方法相结合，在问题跟踪器中对TD进行自动分类。方法:我们从GitHub项目中挖掘和分析了160GB以上的文本数据，收集了55600多个TD问题，并将它们整合成一个大数据集(GTD数据集)。我们使用这样的数据集来训练和测试Transformer ML模型。然后，我们通过对六个未见过的项目进行测试来测试模型的泛化能力。最后，我们重新训练模型，包括目标项目的部分TD问题，以测试它们的适应性。结果和结论:(i)我们创建并发布了GTD数据集，这是一个综合数据集，包括来自6,401个不同背景的公共存储库的TD问题;(ii)通过使用GTD数据集训练变形金刚，我们实现了有希望的性能指标;(iii)我们的结果是支持问题跟踪中TD自动分类的重要一步，特别是当模型在微调后适应未见过的项目背景时。

{"title":"Technical Debt Classification in Issue Trackers using Natural Language Processing based on Transformers","authors":"Daniel Skryseth, Karthik Shivashankar, I. Pilán, A. Martini","doi":"10.1109/TechDebt59074.2023.00017","DOIUrl":"https://doi.org/10.1109/TechDebt59074.2023.00017","url":null,"abstract":"Background: Technical Debt (TD) needs to be controlled and tracked during software development. Support to automatically track TD in issue trackers is limited. Aim: We explore the usage of a large dataset of developer-labeled TD issues in combination with cutting-edge Natural Language Processing (NLP) approaches to automatically classify TD in issue trackers. Method: We mine and analyze more than 160GB of textual data from GitHub projects, collecting over 55,600 TD issues and consolidating them into a large dataset (GTD dataset). We use such datasets to train and test Transformer ML models. Then we test the model’s generalization ability by testing them on six unseen projects. Finally, we re-train the models including part of the TD issues from the target project to test their adaptability. Results and conclusion: (i) We create and release the GTD dataset, a comprehensive dataset including TD issues from 6,401 public repositories with various contexts; (ii) By training Transformers using the GTD dataset, we achieve performance metrics that are promising; (iii) Our results are a significant step forward towards supporting the automatic classification of TD in issue trackers, especially when the models are adapted to the context of unseen projects after fine-tuning.","PeriodicalId":131882,"journal":{"name":"2023 ACM/IEEE International Conference on Technical Debt (TechDebt)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127439552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards identifying and minimizing customer-facing documentation debt 识别并最小化面向客户的文档债务

2023 ACM/IEEE International Conference on Technical Debt (TechDebt)

Pub Date : 2023-05-01 DOI: 10.1109/TechDebt59074.2023.00015

Lakmal Silva, M. Unterkalmsteiner, K. Wnuk

Background: Software documentation often struggles to catch up with the pace of software evolution. The lack of correct, complete, and up-to-date documentation results in an increasing number of documentation defects which could introduce delays in integrating software systems. In our previous study on a bug analysis tool called MultiDimEr, we provided evidence that documentation-related defects contribute to a significant number of bug reports. Aims: First, we want to identify documentation defect types contributing to documentation defects and thereby identifying documentation debt. Secondly, we aim to find pragmatic solutions to minimize most common documentation defects to pay off the documentation debt in the long run. Method: We investigated documentation defects related to an industrial software system. First, we looked at the types of different documentation and associated bug reports. We categorized the defects according to an existing documentation defect taxonomy. Results: Based on a sample of 101 defects, we found that a majority of defects are caused by documentation defects falling into the Information Content (What) category (86). Within this category, the documentation defect types Erroneous code examples (23), Missing documentation (35), and Outdated content (19) contributed to most of the documentation defects. We propose to adapt two solutions to mitigate these types of documentation defects. Conclusions: In practice, documentation debt can easily go undetected since a large share of resources and focus is dedicated to deliver high-quality software. This study provides evidence that documentation debt can contribute to increase in maintenance costs due to the number of documentation defects. We suggest to adapt two main solutions to tackle documentation debt by implementing (i) Dynamic Documentation Generation (DDG) and/or (ii) Automated Documentation Testing (ADT), which are both based on defining a single and robust information source for documentation.

背景:软件文档常常努力追赶软件发展的步伐。缺乏正确、完整和最新的文档会导致越来越多的文档缺陷，这可能会导致软件系统集成的延迟。在我们之前对称为MultiDimEr的错误分析工具的研究中，我们提供了与文档相关的缺陷导致大量错误报告的证据。目标:首先，我们想要确定导致文档缺陷的文档缺陷类型，从而确定文档债务。其次，我们的目标是找到实用的解决方案，以最小化最常见的文档缺陷，从而在长期运行中偿还文档债务。方法:我们调查了与工业软件系统相关的文档缺陷。首先，我们查看了不同文档的类型和相关的bug报告。我们根据现有的文档缺陷分类法对缺陷进行了分类。结果:基于101个缺陷的样本，我们发现大多数缺陷是由落入信息内容(What)类别(86)的文档缺陷引起的。在这个类别中，文档缺陷类型错误的代码示例(23)、缺失的文档(35)和过时的内容(19)导致了大多数文档缺陷。我们建议采用两种解决方案来减轻这些类型的文档缺陷。结论:在实践中，文档债务很容易被忽略，因为大量的资源和焦点都用于交付高质量的软件。本研究提供的证据表明，由于文件缺陷的数量，文件债务可能导致维护成本的增加。我们建议采用两种主要的解决方案，通过实现(i)动态文档生成(DDG)和/或(ii)自动文档测试(ADT)来解决文档债务，这两种解决方案都基于为文档定义一个单一且健壮的信息源。

{"title":"Towards identifying and minimizing customer-facing documentation debt","authors":"Lakmal Silva, M. Unterkalmsteiner, K. Wnuk","doi":"10.1109/TechDebt59074.2023.00015","DOIUrl":"https://doi.org/10.1109/TechDebt59074.2023.00015","url":null,"abstract":"Background: Software documentation often struggles to catch up with the pace of software evolution. The lack of correct, complete, and up-to-date documentation results in an increasing number of documentation defects which could introduce delays in integrating software systems. In our previous study on a bug analysis tool called MultiDimEr, we provided evidence that documentation-related defects contribute to a significant number of bug reports. Aims: First, we want to identify documentation defect types contributing to documentation defects and thereby identifying documentation debt. Secondly, we aim to find pragmatic solutions to minimize most common documentation defects to pay off the documentation debt in the long run. Method: We investigated documentation defects related to an industrial software system. First, we looked at the types of different documentation and associated bug reports. We categorized the defects according to an existing documentation defect taxonomy. Results: Based on a sample of 101 defects, we found that a majority of defects are caused by documentation defects falling into the Information Content (What) category (86). Within this category, the documentation defect types Erroneous code examples (23), Missing documentation (35), and Outdated content (19) contributed to most of the documentation defects. We propose to adapt two solutions to mitigate these types of documentation defects. Conclusions: In practice, documentation debt can easily go undetected since a large share of resources and focus is dedicated to deliver high-quality software. This study provides evidence that documentation debt can contribute to increase in maintenance costs due to the number of documentation defects. We suggest to adapt two main solutions to tackle documentation debt by implementing (i) Dynamic Documentation Generation (DDG) and/or (ii) Automated Documentation Testing (ADT), which are both based on defining a single and robust information source for documentation.","PeriodicalId":131882,"journal":{"name":"2023 ACM/IEEE International Conference on Technical Debt (TechDebt)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130410202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

How to introduce TD Management into a Software Development Process – A Practical Approach 如何将产品开发管理引入软件开发过程-一种实用的方法

2023 ACM/IEEE International Conference on Technical Debt (TechDebt)

Pub Date : 2023-05-01 DOI: 10.1109/TechDebt59074.2023.00013

Markus Finke, Thomas J. Neff, Tobias Reichl

This paper presents a process for management of technical debt (TD) and how it can be integrated sustainably into the existing software development process in an industrial context. A holistic approach is pursued where the development team and the management team agree on a common understanding of TD. From this, requirements for the process are derived, which should support the medium- and long-term planning of the roadmap. By iteratively evaluating the technical debt in the context of the feature roadmap, an internal development benefit (debt repayment) can be combined with an external benefit (new features), while optimizing development costs at the same time. The first results illustrate the positive effect of the continuous repayment of technical debt.

本文介绍了技术债务管理的过程，以及如何在工业环境中可持续地将其集成到现有的软件开发过程中。在开发团队和管理团队就TD的共同理解达成一致的情况下，将采用一种全面的方法。由此衍生出过程的需求，它应该支持路线图的中期和长期规划。通过在特性路线图的上下文中迭代地评估技术债务，内部开发收益(债务偿还)可以与外部收益(新特性)相结合，同时优化开发成本。第一个结果说明了持续偿还技术债务的积极作用。

引用次数: 0

Resolving Security Issues via Quality-Oriented Refactoring: A User Study 通过面向质量的重构解决安全问题:一项用户研究

2023 ACM/IEEE International Conference on Technical Debt (TechDebt)

Pub Date : 2023-05-01 DOI: 10.1109/TechDebt59074.2023.00016

Domenico Gigante, Fabiano Pecorelli, Vita Santa Barletta, Andrea Janes, Valentina Lenarduzzi, D. Taibi, M. T. Baldassarre

Software quality is crucial in software development: if not addressed in early phases of the software development life cycle, it may even lead to technical bankruptcy, i.e., a situation in which modifications cost more than redeveloping the application from scratch. In addition, code security must also be addressed to reduce software vulnerabilities and to comply with legal requirements. In this work, we aim to investigate the relationship between refactoring code quality and software security, with the purpose of understanding whether and to what extent improving software quality could have a positive impact on software security as well. Specifically, we investigate to what extent rule violations of a software quality tool such as SonarQube overlap with rule violations of a software vulnerability tool like Fortify Static Code Analyzer. We first compared the rules encoded in the quality models of both tools, to discover possible overlapping cases. Later, we compared the issues raised by both tools on a set of open source Java projects; we also investigated the cases in which a quality refactoring process impacts over software security (thus removing one or more vulnerabilities). We furthermore validated our results statistically. Our results show that resolving software quality issues might also resolve security issues but only in part: many security issues still persist in the source code; also, some quality aspects are more likely to be improved in respect to others. In addition, this empirical study uncovers rule co-occurrences between the two tools. This study confirms the need for using a security-oriented static analysis tool to enforce software security instead of relying only on a quality-oriented one. Results have highlighted important insights for practitioners.

软件质量在软件开发中是至关重要的:如果不在软件开发生命周期的早期阶段处理，它甚至可能导致技术破产，也就是说，在这种情况下，修改的成本比从头开始重新开发应用程序的成本还要高。此外，还必须处理代码安全性，以减少软件漏洞并遵守法律要求。在这项工作中，我们的目标是研究重构代码质量和软件安全之间的关系，目的是了解改进软件质量是否以及在多大程度上也会对软件安全产生积极影响。具体来说，我们调查了软件质量工具(如SonarQube)违反规则的程度与软件漏洞工具(如Fortify Static Code Analyzer)违反规则的程度重叠。我们首先比较了两个工具的质量模型中编码的规则，以发现可能的重叠情况。后来，我们在一组开源Java项目中比较了这两个工具提出的问题;我们还研究了高质量重构过程影响软件安全性(从而消除一个或多个漏洞)的情况。我们进一步在统计学上验证了我们的结果。我们的结果表明，解决软件质量问题也可能解决安全问题，但只是部分地解决:许多安全问题仍然存在于源代码中;此外，相对于其他方面，一些质量方面更有可能得到改进。此外，本实证研究揭示了两种工具之间的规则共现。这项研究证实了使用面向安全的静态分析工具来加强软件安全的必要性，而不是仅仅依赖于面向质量的工具。结果突出了从业者的重要见解。

{"title":"Resolving Security Issues via Quality-Oriented Refactoring: A User Study","authors":"Domenico Gigante, Fabiano Pecorelli, Vita Santa Barletta, Andrea Janes, Valentina Lenarduzzi, D. Taibi, M. T. Baldassarre","doi":"10.1109/TechDebt59074.2023.00016","DOIUrl":"https://doi.org/10.1109/TechDebt59074.2023.00016","url":null,"abstract":"Software quality is crucial in software development: if not addressed in early phases of the software development life cycle, it may even lead to technical bankruptcy, i.e., a situation in which modifications cost more than redeveloping the application from scratch. In addition, code security must also be addressed to reduce software vulnerabilities and to comply with legal requirements. In this work, we aim to investigate the relationship between refactoring code quality and software security, with the purpose of understanding whether and to what extent improving software quality could have a positive impact on software security as well. Specifically, we investigate to what extent rule violations of a software quality tool such as SonarQube overlap with rule violations of a software vulnerability tool like Fortify Static Code Analyzer. We first compared the rules encoded in the quality models of both tools, to discover possible overlapping cases. Later, we compared the issues raised by both tools on a set of open source Java projects; we also investigated the cases in which a quality refactoring process impacts over software security (thus removing one or more vulnerabilities). We furthermore validated our results statistically. Our results show that resolving software quality issues might also resolve security issues but only in part: many security issues still persist in the source code; also, some quality aspects are more likely to be improved in respect to others. In addition, this empirical study uncovers rule co-occurrences between the two tools. This study confirms the need for using a security-oriented static analysis tool to enforce software security instead of relying only on a quality-oriented one. Results have highlighted important insights for practitioners.","PeriodicalId":131882,"journal":{"name":"2023 ACM/IEEE International Conference on Technical Debt (TechDebt)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133089863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identifying Code Changes for Architecture Decay via a Metric Forest Structure 通过度量森林结构识别体系结构衰减的代码变更

2023 ACM/IEEE International Conference on Technical Debt (TechDebt)

Pub Date : 2023-05-01 DOI: 10.1109/TechDebt59074.2023.00014

Wuxia Jin, Yuyun Zhang, Jiaowei Shang, Yi Hou, Ming Fan, Ting Liu

During long-term software evolution, it is inevitable that an accumulation of changes leads to architectural erosion and debt. Diverse metric-based methods have been developed to identify architectural problems that violate design principles and degrade software maintainability. However, there still exists a gap between the implementation-level metrics and architecture-level metrics. Consequently, it hinders the comprehensibility, interpretability, and indicative(-bility) of the measurement results. To fill this gap, we propose the dbMIT to identify potential code changes that make architecture decay. Our dbMIT first integrates popular metrics such as the CK suite. Then dbMIT constructs a forest structure of metrics, serving as a knowledge base to relate the metrics across levels together. The forest aims to link the architecture-level metrics and implementation-level metrics. Via pre-defined rules using the forest structure, our dbMIT identifies code changes that potentially cause the architecture decay. The usage of forest structure of metrics makes it easy for developers to understand detection results, explain why the detected code changes are potential contributors to the decay, and indicate the code scope for resolution. We also contribute a web-based tool to integrate our dbMIT. Our experiments on the collected projects demonstrate the effectiveness of dbMIT against a history-based ground-truth.

在长期的软件发展过程中，不可避免的是，变更的积累会导致架构的侵蚀和债务。已经开发了各种基于度量的方法来识别违反设计原则和降低软件可维护性的体系结构问题。然而，在实现级度量和体系结构级度量之间仍然存在差距。因此，它阻碍了测量结果的可理解性、可解释性和指示性。为了填补这一空白，我们建议使用dbMIT来识别导致体系结构衰退的潜在代码更改。我们的dbMIT首先集成了流行的度量标准，比如CK套件。然后dbMIT构建一个度量的森林结构，作为知识库将各个级别的度量联系在一起。森林旨在将体系结构级度量和实现级度量联系起来。通过使用森林结构的预定义规则，我们的dbMIT识别可能导致体系结构衰减的代码更改。度量的森林结构的使用使开发人员更容易理解检测结果，解释为什么检测到的代码更改是衰减的潜在贡献者，并指出要解决的代码范围。我们还提供了一个基于web的工具来集成我们的dbMIT。我们在收集的项目上的实验证明了dbMIT对基于历史的基础事实的有效性。

{"title":"Identifying Code Changes for Architecture Decay via a Metric Forest Structure","authors":"Wuxia Jin, Yuyun Zhang, Jiaowei Shang, Yi Hou, Ming Fan, Ting Liu","doi":"10.1109/TechDebt59074.2023.00014","DOIUrl":"https://doi.org/10.1109/TechDebt59074.2023.00014","url":null,"abstract":"During long-term software evolution, it is inevitable that an accumulation of changes leads to architectural erosion and debt. Diverse metric-based methods have been developed to identify architectural problems that violate design principles and degrade software maintainability. However, there still exists a gap between the implementation-level metrics and architecture-level metrics. Consequently, it hinders the comprehensibility, interpretability, and indicative(-bility) of the measurement results. To fill this gap, we propose the dbMIT to identify potential code changes that make architecture decay. Our dbMIT first integrates popular metrics such as the CK suite. Then dbMIT constructs a forest structure of metrics, serving as a knowledge base to relate the metrics across levels together. The forest aims to link the architecture-level metrics and implementation-level metrics. Via pre-defined rules using the forest structure, our dbMIT identifies code changes that potentially cause the architecture decay. The usage of forest structure of metrics makes it easy for developers to understand detection results, explain why the detected code changes are potential contributors to the decay, and indicate the code scope for resolution. We also contribute a web-based tool to integrate our dbMIT. Our experiments on the collected projects demonstrate the effectiveness of dbMIT against a history-based ground-truth.","PeriodicalId":131882,"journal":{"name":"2023 ACM/IEEE International Conference on Technical Debt (TechDebt)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121110729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Technical Debt Contagiousness Metrics for Measurement and Prioritization in Mechatronics 机电一体化领域技术债务传染度量与优先排序方法

2023 ACM/IEEE International Conference on Technical Debt (TechDebt)

Pub Date : 2023-05-01 DOI: 10.1109/TechDebt59074.2023.00012

Fandi Bi, Birgit Vogel-Heuser, Fengmin Du, Nils Hanich, Ennuri Cho

Underlying and undiscovered technical debt (TD) that burdens the system and makes future changes more costly or impossible poses significant risks to mechatronic systems. Multi-disciplinary collaboration and cooperation lead to interdisciplinary interfaces and new life cycle phases that cause greater ripple effects to the TD distribution. When quantifying TD contagiousness in interdisciplinary engineering, only a few metrics, methods, or tools prove applicable. In this work, we propose a method containing two key metrics to quantify TD contagiousness across product life cycles and disciplines. Furthermore, we suggest a matrix multiplication method to quantify the adverse impact on each discipline and the system. By applying the methods to the data of three comparable companies in the industrial automation domain, the results enable us to measure and prioritize the TD incidents’ contagiousness. This method provides a first step towards the systematic quantification of TD in the interdisciplinary environment and provides metrics to compare systems based on objective factors.

潜在的和未发现的技术债务(TD)给系统带来负担，使未来的更改成本更高或不可能，这对机电系统构成了重大风险。多学科协作和合作导致了跨学科的界面和新的生命周期阶段，这对TD分布产生了更大的连锁反应。当量化跨学科工程中的TD传染性时，只有少数指标、方法或工具被证明是适用的。在这项工作中，我们提出了一种包含两个关键指标的方法来量化跨产品生命周期和学科的TD传染性。此外，我们建议采用矩阵乘法方法来量化对每个学科和系统的不利影响。通过将方法应用于工业自动化领域三家可比公司的数据，结果使我们能够衡量和优先考虑TD事件的传染性。该方法为跨学科环境中系统量化技术开发迈出了第一步，并提供了基于客观因素的系统比较指标。

{"title":"Technical Debt Contagiousness Metrics for Measurement and Prioritization in Mechatronics","authors":"Fandi Bi, Birgit Vogel-Heuser, Fengmin Du, Nils Hanich, Ennuri Cho","doi":"10.1109/TechDebt59074.2023.00012","DOIUrl":"https://doi.org/10.1109/TechDebt59074.2023.00012","url":null,"abstract":"Underlying and undiscovered technical debt (TD) that burdens the system and makes future changes more costly or impossible poses significant risks to mechatronic systems. Multi-disciplinary collaboration and cooperation lead to interdisciplinary interfaces and new life cycle phases that cause greater ripple effects to the TD distribution. When quantifying TD contagiousness in interdisciplinary engineering, only a few metrics, methods, or tools prove applicable. In this work, we propose a method containing two key metrics to quantify TD contagiousness across product life cycles and disciplines. Furthermore, we suggest a matrix multiplication method to quantify the adverse impact on each discipline and the system. By applying the methods to the data of three comparable companies in the industrial automation domain, the results enable us to measure and prioritize the TD incidents’ contagiousness. This method provides a first step towards the systematic quantification of TD in the interdisciplinary environment and provides metrics to compare systems based on objective factors.","PeriodicalId":131882,"journal":{"name":"2023 ACM/IEEE International Conference on Technical Debt (TechDebt)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116841972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploring the Effect of Various Maintenance Activities on the Accumulation of TD Principal 探讨各种维护活动对TD本金积累的影响

2023 ACM/IEEE International Conference on Technical Debt (TechDebt)

Pub Date : 2023-05-01 DOI: 10.1109/TechDebt59074.2023.00018

Nikolaos Nikolaidis, Apostolos Ampatzoglou, A. Chatzigeorgiou, N. Mittas, E. Konstantinidis, P. Bamidis

One of the most well-known laws of software evolution suggests that code quality deteriorates over time. Following this law, recent empirical studies have brought evidence that Technical Debt (TD) Principal tends to increase (in absolute value) as the system grows, since more technical debt issues are added than resolved over time. To shed light into how technical debt accumulation occurs in practice, in this paper we explore specific maintenance activities (i.e., feature addition, bug fixing, and refactoring) and explore the balance between the technical debt that they introduce or resolve. To achieve this goal, we rely on studying Pull Requests (PR), which are the most established way to contribute code to an open-source project. A Pull Request is usually comprised by more than one commits, corresponding to a specific development / maintenance activity. In our study, we categorized Pull Requests, based on their labels, to find the effect that the different maintenance activities have on the accumulation of technical debt across evolution. In particular, we have analysed more than 13.5K pull requests (mined from 10 OSS projects), by calculating the TD Principal (calculated through SonarQube) before and after the Pull Requests. The results of the study suggested that several labels are used for tagging Pull Requests, out of which the most prevalent ones are new features, bug fixing, and refactoring. The effect of these activities on TD Principal accumulation is statistically different, and: (a) the addition of features tends to increase TD Principal; (b) refactoring is having an almost consistent positive effect (reducing TD Principal); and (c) bug fixing activity has undecisive impact on TD Principal. These results are compared to existing studies, interpreted, and various useful implications for researchers and practitioners have been drawn.

最著名的软件发展规律之一是代码质量会随着时间的推移而恶化。根据这一规律，最近的实证研究证明，随着系统的增长，技术债务(TD)本金倾向于增加(绝对值)，因为随着时间的推移，增加的技术债务问题比解决的要多。为了阐明技术债务积累在实践中是如何发生的，在本文中，我们探讨了具体的维护活动(例如，特性添加、bug修复和重构)，并探讨了它们引入或解决的技术债务之间的平衡。为了实现这一目标，我们依赖于研究Pull Requests (PR)，这是向开源项目贡献代码的最成熟的方式。一个Pull Request通常由多个提交组成，对应于一个特定的开发/维护活动。在我们的研究中，我们根据Pull Requests的标签对它们进行了分类，以发现不同的维护活动对跨越演进的技术债务积累的影响。特别是，我们分析了超过13.5K的拉取请求(从10个OSS项目中挖掘)，通过计算拉取请求之前和之后的TD Principal(通过SonarQube计算)。研究结果表明，有几种标签用于标记Pull request，其中最常见的是新特性、bug修复和重构。这些活动对TD本金积累的影响具有统计学差异，并且:(a)特征的增加倾向于增加TD本金;(b)重构具有几乎一致的积极效果(减少TD Principal);(c)漏洞修复活动对道明信有不确定的影响。这些结果与现有的研究进行了比较，解释，并为研究人员和从业人员提供了各种有用的启示。

{"title":"Exploring the Effect of Various Maintenance Activities on the Accumulation of TD Principal","authors":"Nikolaos Nikolaidis, Apostolos Ampatzoglou, A. Chatzigeorgiou, N. Mittas, E. Konstantinidis, P. Bamidis","doi":"10.1109/TechDebt59074.2023.00018","DOIUrl":"https://doi.org/10.1109/TechDebt59074.2023.00018","url":null,"abstract":"One of the most well-known laws of software evolution suggests that code quality deteriorates over time. Following this law, recent empirical studies have brought evidence that Technical Debt (TD) Principal tends to increase (in absolute value) as the system grows, since more technical debt issues are added than resolved over time. To shed light into how technical debt accumulation occurs in practice, in this paper we explore specific maintenance activities (i.e., feature addition, bug fixing, and refactoring) and explore the balance between the technical debt that they introduce or resolve. To achieve this goal, we rely on studying Pull Requests (PR), which are the most established way to contribute code to an open-source project. A Pull Request is usually comprised by more than one commits, corresponding to a specific development / maintenance activity. In our study, we categorized Pull Requests, based on their labels, to find the effect that the different maintenance activities have on the accumulation of technical debt across evolution. In particular, we have analysed more than 13.5K pull requests (mined from 10 OSS projects), by calculating the TD Principal (calculated through SonarQube) before and after the Pull Requests. The results of the study suggested that several labels are used for tagging Pull Requests, out of which the most prevalent ones are new features, bug fixing, and refactoring. The effect of these activities on TD Principal accumulation is statistically different, and: (a) the addition of features tends to increase TD Principal; (b) refactoring is having an almost consistent positive effect (reducing TD Principal); and (c) bug fixing activity has undecisive impact on TD Principal. These results are compared to existing studies, interpreted, and various useful implications for researchers and practitioners have been drawn.","PeriodicalId":131882,"journal":{"name":"2023 ACM/IEEE International Conference on Technical Debt (TechDebt)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133606371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Copyright Page 版权页

2023 ACM/IEEE International Conference on Technical Debt (TechDebt)

Pub Date : 2023-05-01 DOI: 10.1109/techdebt59074.2023.00003

引用次数: 0

Automated Self-Admitted Technical Debt Tracking at Commit-Level: A Language-independent Approach 委员会级别的自动自我承认技术债务跟踪:一种独立于语言的方法

2023 ACM/IEEE International Conference on Technical Debt (TechDebt)

Pub Date : 2023-04-16 DOI: 10.1109/TechDebt59074.2023.00009

M. Sheikhaei, Yuan Tian

Software and systems traceability is essential for downstream tasks such as data-driven software analysis and intelligent tool development. However, despite the increasing attention to mining and understanding technical debt in software systems, specific tools for supporting the track of technical debts are rarely available. In this work, we propose the first programming language-independent tracking tool for self-admitted technical debt (SATD) – a sub-optimal solution that is explicitly annotated by developers in software systems. Our approach takes a git repository as input and returns a list of SATDs with their evolution actions (created, deleted, updated) at the commit-level. Our approach also returns a line number indicating the latest starting position of the corresponding SATD in the system. Our SATD tracking approach first identifies an initial set of raw SATDs (which only have created and deleted actions) by detecting and tracking SATDs in commits’ hunks, leveraging a state-of-the-art language-independent SATD detection approach. Then it calculates a context-based matching score between pairs of deleted and created raw SATDs in the same commits to identify SATD update actions. The results of our preliminary study on Apache Tomcat and Apache Ant show that our tracking tool can achieve a F1 score of 92.8% and 96.7% respectively.

软件和系统的可追溯性对于下游任务(如数据驱动的软件分析和智能工具开发)是必不可少的。然而，尽管越来越多的人关注于挖掘和理解软件系统中的技术债务，但是支持跟踪技术债务的特定工具很少可用。在这项工作中，我们提出了第一个独立于编程语言的跟踪工具，用于自我承认的技术债务(SATD)——一个由软件系统中的开发人员明确注释的次优解决方案。我们的方法将git存储库作为输入，并在提交级别返回带有其演进操作(创建、删除、更新)的sata列表。我们的方法还返回一个行号，表示系统中对应的SATD的最新起始位置。我们的SATD跟踪方法首先通过检测和跟踪提交块中的SATD来识别一组初始的原始SATD(仅具有创建和删除操作)，利用最先进的与语言无关的SATD检测方法。然后，它计算同一提交中已删除和已创建的原始sata对之间基于上下文的匹配分数，以识别sata更新操作。我们对Apache Tomcat和Apache Ant的初步研究结果表明，我们的跟踪工具可以分别达到92.8%和96.7%的F1分数。

{"title":"Automated Self-Admitted Technical Debt Tracking at Commit-Level: A Language-independent Approach","authors":"M. Sheikhaei, Yuan Tian","doi":"10.1109/TechDebt59074.2023.00009","DOIUrl":"https://doi.org/10.1109/TechDebt59074.2023.00009","url":null,"abstract":"Software and systems traceability is essential for downstream tasks such as data-driven software analysis and intelligent tool development. However, despite the increasing attention to mining and understanding technical debt in software systems, specific tools for supporting the track of technical debts are rarely available. In this work, we propose the first programming language-independent tracking tool for self-admitted technical debt (SATD) – a sub-optimal solution that is explicitly annotated by developers in software systems. Our approach takes a git repository as input and returns a list of SATDs with their evolution actions (created, deleted, updated) at the commit-level. Our approach also returns a line number indicating the latest starting position of the corresponding SATD in the system. Our SATD tracking approach first identifies an initial set of raw SATDs (which only have created and deleted actions) by detecting and tracking SATDs in commits’ hunks, leveraging a state-of-the-art language-independent SATD detection approach. Then it calculates a context-based matching score between pairs of deleted and created raw SATDs in the same commits to identify SATD update actions. The results of our preliminary study on Apache Tomcat and Apache Ant show that our tracking tool can achieve a F1 score of 92.8% and 96.7% respectively.","PeriodicalId":131882,"journal":{"name":"2023 ACM/IEEE International Conference on Technical Debt (TechDebt)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130389236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Measuring Improvement of F1-Scores in Detection of Self-Admitted Technical Debt 自我承认技术债务检测中f1分数的改进测量

2023 ACM/IEEE International Conference on Technical Debt (TechDebt)

Pub Date : 2023-03-16 DOI: 10.1109/TechDebt59074.2023.00011

William Aiken, Paul K. Mvula, Paula Branco, Guy-Vincent Jourdan, M. Sabetzadeh, H. Viktor

Artificial Intelligence and Machine Learning have witnessed rapid, significant improvements in Natural Language Processing (NLP) tasks. Utilizing Deep Learning, researchers have taken advantage of repository comments in Software Engineering to produce accurate methods for detecting Self-Admitted Technical Debt (SATD) from 20 open-source Java projects’ code. In this work, we improve SATD detection with a novel approach that leverages the Bidirectional Encoder Representations from Transformers (BERT) architecture. For comparison, we re-evaluated previous deep learning methods and applied stratified 10-fold cross-validation to report reliable F1-scores. We examine our model in both cross-project and intra-project contexts. For each context, we use re-sampling and duplication as augmentation strategies to account for data imbalance. We find that our trained BERT model improves over the best performance of all previous methods in 19 of the 20 projects in cross-project scenarios. However, the data augmentation techniques were not sufficient to overcome the lack of data present in the intra-project scenarios, and existing methods still perform better. Future research will look into ways to diversify SATD datasets in order to maximize the latent power in large BERT models.

人工智能和机器学习见证了自然语言处理(NLP)任务的快速、重大改进。利用深度学习，研究人员利用软件工程中的存储库注释，从20个开源Java项目的代码中产生准确的方法来检测自我承认的技术债务(SATD)。在这项工作中，我们利用一种利用变压器(BERT)架构的双向编码器表示的新方法改进了SATD检测。为了进行比较，我们重新评估了以前的深度学习方法，并应用分层10倍交叉验证来报告可靠的f1分数。我们在跨项目和项目内部环境中检查我们的模型。对于每种情况，我们使用重采样和复制作为增强策略来解释数据不平衡。我们发现，在跨项目场景的20个项目中的19个项目中，我们训练的BERT模型比之前所有方法的最佳性能都有所提高。然而，数据增加技术不足以克服项目内部情景中存在的数据缺乏，现有方法仍然表现更好。未来的研究将着眼于如何使SATD数据集多样化，以最大化大型BERT模型的潜在能力。

{"title":"Measuring Improvement of F1-Scores in Detection of Self-Admitted Technical Debt","authors":"William Aiken, Paul K. Mvula, Paula Branco, Guy-Vincent Jourdan, M. Sabetzadeh, H. Viktor","doi":"10.1109/TechDebt59074.2023.00011","DOIUrl":"https://doi.org/10.1109/TechDebt59074.2023.00011","url":null,"abstract":"Artificial Intelligence and Machine Learning have witnessed rapid, significant improvements in Natural Language Processing (NLP) tasks. Utilizing Deep Learning, researchers have taken advantage of repository comments in Software Engineering to produce accurate methods for detecting Self-Admitted Technical Debt (SATD) from 20 open-source Java projects’ code. In this work, we improve SATD detection with a novel approach that leverages the Bidirectional Encoder Representations from Transformers (BERT) architecture. For comparison, we re-evaluated previous deep learning methods and applied stratified 10-fold cross-validation to report reliable F1-scores. We examine our model in both cross-project and intra-project contexts. For each context, we use re-sampling and duplication as augmentation strategies to account for data imbalance. We find that our trained BERT model improves over the best performance of all previous methods in 19 of the 20 projects in cross-project scenarios. However, the data augmentation techniques were not sufficient to overcome the lack of data present in the intra-project scenarios, and existing methods still perform better. Future research will look into ways to diversify SATD datasets in order to maximize the latent power in large BERT models.","PeriodicalId":131882,"journal":{"name":"2023 ACM/IEEE International Conference on Technical Debt (TechDebt)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121861403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1