Understanding Code Understandability Improvements in Code Reviews

IF 5.6 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING IEEE Transactions on Software Engineering Pub Date : 2024-09-10 DOI:10.1109/TSE.2024.3453783

Delano Oliveira;Reydne Santos;Benedito de Oliveira;Martin Monperrus;Fernando Castor;Fernanda Madeiral

{"title":"Understanding Code Understandability Improvements in Code Reviews","authors":"Delano Oliveira;Reydne Santos;Benedito de Oliveira;Martin Monperrus;Fernando Castor;Fernanda Madeiral","doi":"10.1109/TSE.2024.3453783","DOIUrl":null,"url":null,"abstract":"<italic>Context:\n Code understandability plays a crucial role in software development, as developers spend between 58% and 70% of their time reading source code. Improving code understandability can lead to enhanced productivity and save maintenance costs. \n<italic>Problem:\n Experimental studies aim to establish what makes code more or less understandable in a controlled setting, but ignore that what makes code easier to understand in the real world also depends on extraneous elements such as developers’ background and project culture and guidelines. Not accounting for the influence of these factors may lead to results that are sound but have little external validity. \n<italic>Goal:\n We aim to investigate how developers improve code understandability during software development through code review comments. Our assumption is that code reviewers are specialists in code quality within a project. \n<italic>Method and Results:\n We manually analyzed 2,401 code review comments from Java open-source projects on GitHub and found that over 42% of all comments focus on improving code understandability, demonstrating the significance of this quality attribute in code reviews. We further explored a subset of 385 comments related to code understandability and identified eight categories of code understandability concerns, such as incomplete or inadequate code documentation, bad identifier, and unnecessary code. Among the suggestions to improve code understandability, 83.9% were accepted and integrated into the codebase. Among these, only two (less than 1%) ended up being reverted later. We also identified types of patches that improve code understandability, ranging from simple changes (e.g., removing unused code) to more context-dependent improvements (e.g., replacing method calling chains by existing API). Finally, we investigated the potential coverage of four well-known linters to flag the identified code understandability issues. These linters cover less than 30% of these issues, although some of them could be easily added as new rules. \n<italic>Implications:\n Our findings motivate and provide practical insight for the construction of tools to make code more understandable, e.g., understandability improvements are rarely reverted and thus can be used as reliable training data for specialized ML-based tools. This is also supported by our dataset, which can be used to train such models. Finally, our findings can also serve as a basis to develop evidence-based code style guides.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 1","pages":"14-37"},"PeriodicalIF":5.6000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10670481/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Context: Code understandability plays a crucial role in software development, as developers spend between 58% and 70% of their time reading source code. Improving code understandability can lead to enhanced productivity and save maintenance costs. Problem: Experimental studies aim to establish what makes code more or less understandable in a controlled setting, but ignore that what makes code easier to understand in the real world also depends on extraneous elements such as developers’ background and project culture and guidelines. Not accounting for the influence of these factors may lead to results that are sound but have little external validity. Goal: We aim to investigate how developers improve code understandability during software development through code review comments. Our assumption is that code reviewers are specialists in code quality within a project. Method and Results: We manually analyzed 2,401 code review comments from Java open-source projects on GitHub and found that over 42% of all comments focus on improving code understandability, demonstrating the significance of this quality attribute in code reviews. We further explored a subset of 385 comments related to code understandability and identified eight categories of code understandability concerns, such as incomplete or inadequate code documentation, bad identifier, and unnecessary code. Among the suggestions to improve code understandability, 83.9% were accepted and integrated into the codebase. Among these, only two (less than 1%) ended up being reverted later. We also identified types of patches that improve code understandability, ranging from simple changes (e.g., removing unused code) to more context-dependent improvements (e.g., replacing method calling chains by existing API). Finally, we investigated the potential coverage of four well-known linters to flag the identified code understandability issues. These linters cover less than 30% of these issues, although some of them could be easily added as new rules. Implications: Our findings motivate and provide practical insight for the construction of tools to make code more understandable, e.g., understandability improvements are rarely reverted and thus can be used as reliable training data for specialized ML-based tools. This is also supported by our dataset, which can be used to train such models. Finally, our findings can also serve as a basis to develop evidence-based code style guides.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

了解代码审查中代码可理解性的改进

上下文：代码的可理解性在软件开发中起着至关重要的作用，因为开发人员要花费58%到70%的时间来阅读源代码。改进代码的可理解性可以提高生产力并节省维护成本。问题：实验研究的目的是确定在受控环境中什么使代码更容易或更不容易理解，但忽略了在现实世界中什么使代码更容易理解也取决于无关的因素，如开发人员的背景、项目文化和指导方针。不考虑这些因素的影响可能会导致结果是合理的，但几乎没有外部有效性。目标：我们的目标是研究开发人员如何在软件开发过程中通过代码审查注释来提高代码的可理解性。我们的假设是代码审查者是项目中代码质量方面的专家。方法与结果：我们手工分析了GitHub上来自Java开源项目的2401条代码审查评论，发现超过42%的评论集中在提高代码的可理解性上，说明了这一质量属性在代码审查中的重要性。我们进一步研究了与代码可理解性相关的385条注释的子集，并确定了8类代码可理解性问题，例如不完整或不充分的代码文档、错误的标识符和不必要的代码。在提高代码可理解性的建议中，83.9%被接受并集成到代码库中。其中，只有两个（不到1%）后来恢复了原状。我们还确定了提高代码可理解性的补丁类型，范围从简单的更改（例如，删除未使用的代码）到更依赖于上下文的改进（例如，用现有API替换方法调用链）。最后，我们调查了四个众所周知的linter的潜在覆盖率，以标记已识别的代码可理解性问题。这些条款只涵盖了不到30%的问题，尽管其中一些可以很容易地添加为新规则。启示：我们的发现激励并为构建工具提供了实用的见解，使代码更易于理解，例如，可理解性的改进很少被逆转，因此可以用作基于ml的专业工具的可靠训练数据。我们的数据集也支持这一点，它可以用来训练这样的模型。最后，我们的发现还可以作为开发基于证据的代码风格指南的基础。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Software Engineering 工程技术-工程：电子与电气

CiteScore

9.70

自引率

10.80%

发文量

724

审稿时长

6 months

期刊介绍： IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.