Fixing Dockerfile smells: an empirical study

IF 3.5 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Empirical Software Engineering Pub Date : 2024-07-06 DOI:10.1007/s10664-024-10471-7

Giovanni Rosa, Federico Zappone, Simone Scalabrino, Rocco Oliveto

{"title":"Fixing Dockerfile smells: an empirical study","authors":"Giovanni Rosa, Federico Zappone, Simone Scalabrino, Rocco Oliveto","doi":"10.1007/s10664-024-10471-7","DOIUrl":null,"url":null,"abstract":"<p>Docker is the <i>de facto</i> standard for software containerization. A Dockerfile contains the requirements to build a Docker image containing a target application. There are several best practice rules for writing Dockerfiles, but the developers do not always follow them. Violations of such practices, known as Dockerfile smells, can negatively impact the reliability and performance of Docker images. Previous studies showed that Dockerfile smells are widely diffused, and there is a lack of automatic tools that support developers in fixing them. However, it is still unclear what Dockerfile smells get fixed by developers and to what extent developers would be willing to fix smells in the first place. The aim of our study is twofold. First, we want to understand what Dockerfiles smells receive more attention from developers, i.e., are fixed more frequently in the history of open-source projects. Second, we want to check if developers are willing to accept changes aimed at fixing Dockerfile smells (e.g., generated by an automated tool), to understand if they care about them. We evaluated the survivability of Dockerfile smells from a total of 53,456 unique Dockerfiles, where we manually validated a large sample of smell-removing commits to understand (i) if developers performed the change with the intention of removing bad practices, and (ii) if they were aware of the removed smell. In the second part, we used a rule-based tool to automatically fix Dockerfile smells. Then, we proposed such fixes to developers via pull requests. Finally, we quantitatively and qualitatively evaluated the outcome after a monitoring period of more than 7 months. The results of our study showed that most developers pay more attention to changes aimed at improving the performance of Dockerfiles (image size and build time). Moreover, they are willing to accept the fixes for the most common smells, with some exceptions (e.g., missing version pinning for OS packages).</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"18 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Empirical Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10664-024-10471-7","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Docker is the de facto standard for software containerization. A Dockerfile contains the requirements to build a Docker image containing a target application. There are several best practice rules for writing Dockerfiles, but the developers do not always follow them. Violations of such practices, known as Dockerfile smells, can negatively impact the reliability and performance of Docker images. Previous studies showed that Dockerfile smells are widely diffused, and there is a lack of automatic tools that support developers in fixing them. However, it is still unclear what Dockerfile smells get fixed by developers and to what extent developers would be willing to fix smells in the first place. The aim of our study is twofold. First, we want to understand what Dockerfiles smells receive more attention from developers, i.e., are fixed more frequently in the history of open-source projects. Second, we want to check if developers are willing to accept changes aimed at fixing Dockerfile smells (e.g., generated by an automated tool), to understand if they care about them. We evaluated the survivability of Dockerfile smells from a total of 53,456 unique Dockerfiles, where we manually validated a large sample of smell-removing commits to understand (i) if developers performed the change with the intention of removing bad practices, and (ii) if they were aware of the removed smell. In the second part, we used a rule-based tool to automatically fix Dockerfile smells. Then, we proposed such fixes to developers via pull requests. Finally, we quantitatively and qualitatively evaluated the outcome after a monitoring period of more than 7 months. The results of our study showed that most developers pay more attention to changes aimed at improving the performance of Dockerfiles (image size and build time). Moreover, they are willing to accept the fixes for the most common smells, with some exceptions (e.g., missing version pinning for OS packages).

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

修复 Dockerfile 的气味：实证研究

Docker 是软件容器化的事实标准。Dockerfile 包含构建包含目标应用程序的 Docker 镜像的要求。编写 Dockerfile 有几种最佳实践规则，但开发人员并不总是遵守这些规则。违反这些规则会对 Docker 镜像的可靠性和性能产生负面影响，这种情况被称为 Dockerfile smells。以前的研究表明，Dockerfile气味广泛传播，但缺乏支持开发人员修复这些气味的自动工具。然而，目前仍不清楚开发人员修复了哪些 Dockerfile 缺陷，也不清楚开发人员在多大程度上愿意首先修复缺陷。我们的研究有两个目的。首先，我们想了解哪些 Dockerfile 缺陷会受到开发人员更多的关注，即在开源项目的历史中被修复的频率更高。其次，我们想检查开发人员是否愿意接受旨在修复 Dockerfile 缺陷（例如由自动化工具生成的缺陷）的变更，以了解他们是否关心这些缺陷。我们从总共 53456 个独特的 Dockerfile 中评估了 Dockerfile 异味的存活率，并对大量去除异味的提交进行了人工验证，以了解 (i) 开发人员是否出于去除不良做法的目的进行了更改，以及 (ii) 他们是否意识到了所去除的异味。第二部分，我们使用基于规则的工具自动修复 Dockerfile 中的气味。然后，我们通过拉取请求向开发人员提出修复建议。最后，在超过 7 个月的监控期后，我们对结果进行了定量和定性评估。我们的研究结果表明，大多数开发人员更关注旨在提高 Dockerfile 性能（镜像大小和构建时间）的变更。此外，他们愿意接受对最常见问题的修复，但也有一些例外（如操作系统软件包缺少版本固定）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Empirical Software Engineering 工程技术-计算机：软件工程

CiteScore

8.50

自引率

12.20%

发文量

169

审稿时长

>12 weeks

期刊介绍： Empirical Software Engineering provides a forum for applied software engineering research with a strong empirical component, and a venue for publishing empirical results relevant to both researchers and practitioners. Empirical studies presented here usually involve the collection and analysis of data and experience that can be used to characterize, evaluate and reveal relationships between software development deliverables, practices, and technologies. Over time, it is expected that such empirical results will form a body of knowledge leading to widely accepted and well-formed theories. The journal also offers industrial experience reports detailing the application of software technologies - processes, methods, or tools - and their effectiveness in industrial settings. Empirical Software Engineering promotes the publication of industry-relevant research, to address the significant gap between research and practice.

期刊最新文献

The effect of data complexity on classifier performance. Reinforcement learning for online testing of autonomous driving systems: a replication and extension study. An empirical study on developers’ shared conversations with ChatGPT in GitHub pull requests and issues Quality issues in machine learning software systems An empirical study of token-based micro commits