Empirical Software Engineering最新文献_第5页

An empirical study on cross-component dependent changes: A case study on the components of OpenStack 关于跨组件依赖性变更的实证研究：OpenStack 组件案例研究

IF 4.1 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Empirical Software Engineering

Pub Date : 2024-07-13 DOI: 10.1007/s10664-024-10488-y

Ali Arabat, Mohammed Sayagh

Modern software systems are composed of several loosely coupled components. Typical examples of such systems are plugin-based systems, microservices, and modular software systems. Such types of software systems have several advantages motivating a large body of research to propose approaches for the migration from monolithic software systems to modular architecture (mainly microservices). However, a few prior works investigated how to assist practitioners post-migration. In fact, these studies reported that having independent components is difficult to achieve, leading to several evolution challenges that are still manually handled. In this paper, we conduct an empirical study on OpenStack and its 1,310 projects (aka., components) to better understand how the changes to a given component depend on changes of other components (aka., cross-component changes) so managers can better plan for their changes in a cross-component project, and researchers can design better solutions to help practitioners in such a co-evolution and the maintenance of multi-component software systems. We observe that the concept of ownership exists in the context of OpenStack, as different teams do not share the responsibility over the studied components of OpenStack. Despite that, dependencies across different components are not exceptional but exist in all releases. In fact, we observe that 52,069 OpenStack changes (almost 10% of all the changes) depend on changes in other components. Such a number of cross-component changes continuously increased over different years and releases, up to a certain release in which OpenStack decided to make a major refactoring of its project by archiving over 500 projects. We also found that a good percentage of cross-component changes (20.85%) end up being abandoned, leading to wasteful synchronization efforts between different teams. These dependent changes occur for different reasons that we qualitatively identified, among which configuration-related (34.64%) changes are the most common, while developers create cross-component changes for testing purposes then abandon such changes as the most prevalent category (38.45%). These cross-project changes lead to collaboration between different teams to synchronize their changes since 24.55% of the pairs of two cross-component changes are made by different developers, while the second change is reviewed by the developer of the first change of the pair (71.63%). Even when a developer makes both changes, that developer ends up working on a project that she/he is less familiar with. Our results shed light on how different components end up being dependent on each other in terms of their maintenance, which can help managers better plan their changes and guide researchers in proposing appropriate approaches for assisting in the maintenance of multi-component systems.

现代软件系统由多个松散耦合的组件组成。这类系统的典型例子是基于插件的系统、微服务和模块化软件系统。这类软件系统具有多种优势，因此有大量研究提出了从单体软件系统向模块化架构（主要是微服务）迁移的方法。然而，此前只有少数著作研究了如何在迁移后为从业人员提供帮助。事实上，这些研究报告指出，拥有独立的组件很难实现，这导致了一些仍需人工处理的进化挑战。在本文中，我们对 OpenStack 及其 1,310 个项目（又称组件）进行了实证研究，以更好地了解给定组件的变化如何依赖于其他组件的变化（又称跨组件变化），从而使管理人员能够更好地规划跨组件项目中的变化，使研究人员能够设计出更好的解决方案，帮助从业人员实现这种共同演化和维护多组件软件系统。我们注意到，在 OpenStack 中存在所有权的概念，因为不同的团队并不分担对 OpenStack 所研究组件的责任。尽管如此，不同组件之间的依赖关系并非特例，而是存在于所有版本中。事实上，我们发现有 52069 次 OpenStack 变更（几乎占所有变更的 10%）依赖于其他组件的变更。这种跨组件变更的数量在不同年份和版本中持续增加，直到某个版本，OpenStack 决定对其项目进行重大重构，将 500 多个项目归档。我们还发现，有相当比例的跨组件变更（20.85%）最终被放弃，导致不同团队之间同步工作的浪费。这些依赖性变更的发生有不同的原因，我们对这些原因进行了定性分析，其中与配置相关的变更（34.64%）最为常见，而开发人员出于测试目的创建跨组件变更，然后放弃此类变更的情况最为普遍（38.45%）。由于 24.55%的成对跨组件变更由不同的开发人员完成，而第二项变更则由成对变更中第一项变更的开发人员审核（71.63%），因此这些跨项目变更导致了不同团队之间的协作，以同步他们的变更。即使一个开发人员同时做出了两个变更，该开发人员最终也是在自己不太熟悉的项目上工作。我们的研究结果揭示了不同组件在维护方面是如何相互依赖的，这可以帮助管理人员更好地规划变更，并指导研究人员提出适当的方法来协助维护多组件系统。

{"title":"An empirical study on cross-component dependent changes: A case study on the components of OpenStack","authors":"Ali Arabat, Mohammed Sayagh","doi":"10.1007/s10664-024-10488-y","DOIUrl":"https://doi.org/10.1007/s10664-024-10488-y","url":null,"abstract":"Modern software systems are composed of several loosely coupled components. Typical examples of such systems are plugin-based systems, microservices, and modular software systems. Such types of software systems have several advantages motivating a large body of research to propose approaches for the migration from monolithic software systems to modular architecture (mainly microservices). However, a few prior works investigated how to assist practitioners post-migration. In fact, these studies reported that having independent components is difficult to achieve, leading to several evolution challenges that are still manually handled. In this paper, we conduct an empirical study on OpenStack and its 1,310 projects (aka., components) to better understand how the changes to a given component depend on changes of other components (aka., cross-component changes) so managers can better plan for their changes in a cross-component project, and researchers can design better solutions to help practitioners in such a co-evolution and the maintenance of multi-component software systems. We observe that the concept of ownership exists in the context of OpenStack, as different teams do not share the responsibility over the studied components of OpenStack. Despite that, dependencies across different components are not exceptional but exist in all releases. In fact, we observe that 52,069 OpenStack changes (almost 10% of all the changes) depend on changes in other components. Such a number of cross-component changes continuously increased over different years and releases, up to a certain release in which OpenStack decided to make a major refactoring of its project by archiving over 500 projects. We also found that a good percentage of cross-component changes (20.85%) end up being abandoned, leading to wasteful synchronization efforts between different teams. These dependent changes occur for different reasons that we qualitatively identified, among which configuration-related (34.64%) changes are the most common, while developers create cross-component changes for testing purposes then abandon such changes as the most prevalent category (38.45%). These cross-project changes lead to collaboration between different teams to synchronize their changes since 24.55% of the pairs of two cross-component changes are made by different developers, while the second change is reviewed by the developer of the first change of the pair (71.63%). Even when a developer makes both changes, that developer ends up working on a project that she/he is less familiar with. Our results shed light on how different components end up being dependent on each other in terms of their maintenance, which can help managers better plan their changes and guide researchers in proposing appropriate approaches for assisting in the maintenance of multi-component systems.","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"93 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141613761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fixing Dockerfile smells: an empirical study 修复 Dockerfile 的气味：实证研究

IF 4.1 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Empirical Software Engineering

Pub Date : 2024-07-06 DOI: 10.1007/s10664-024-10471-7

Giovanni Rosa, Federico Zappone, Simone Scalabrino, Rocco Oliveto

Docker is the de facto standard for software containerization. A Dockerfile contains the requirements to build a Docker image containing a target application. There are several best practice rules for writing Dockerfiles, but the developers do not always follow them. Violations of such practices, known as Dockerfile smells, can negatively impact the reliability and performance of Docker images. Previous studies showed that Dockerfile smells are widely diffused, and there is a lack of automatic tools that support developers in fixing them. However, it is still unclear what Dockerfile smells get fixed by developers and to what extent developers would be willing to fix smells in the first place. The aim of our study is twofold. First, we want to understand what Dockerfiles smells receive more attention from developers, i.e., are fixed more frequently in the history of open-source projects. Second, we want to check if developers are willing to accept changes aimed at fixing Dockerfile smells (e.g., generated by an automated tool), to understand if they care about them. We evaluated the survivability of Dockerfile smells from a total of 53,456 unique Dockerfiles, where we manually validated a large sample of smell-removing commits to understand (i) if developers performed the change with the intention of removing bad practices, and (ii) if they were aware of the removed smell. In the second part, we used a rule-based tool to automatically fix Dockerfile smells. Then, we proposed such fixes to developers via pull requests. Finally, we quantitatively and qualitatively evaluated the outcome after a monitoring period of more than 7 months. The results of our study showed that most developers pay more attention to changes aimed at improving the performance of Dockerfiles (image size and build time). Moreover, they are willing to accept the fixes for the most common smells, with some exceptions (e.g., missing version pinning for OS packages).

Docker 是软件容器化的事实标准。Dockerfile 包含构建包含目标应用程序的 Docker 镜像的要求。编写 Dockerfile 有几种最佳实践规则，但开发人员并不总是遵守这些规则。违反这些规则会对 Docker 镜像的可靠性和性能产生负面影响，这种情况被称为 Dockerfile smells。以前的研究表明，Dockerfile气味广泛传播，但缺乏支持开发人员修复这些气味的自动工具。然而，目前仍不清楚开发人员修复了哪些 Dockerfile 缺陷，也不清楚开发人员在多大程度上愿意首先修复缺陷。我们的研究有两个目的。首先，我们想了解哪些 Dockerfile 缺陷会受到开发人员更多的关注，即在开源项目的历史中被修复的频率更高。其次，我们想检查开发人员是否愿意接受旨在修复 Dockerfile 缺陷（例如由自动化工具生成的缺陷）的变更，以了解他们是否关心这些缺陷。我们从总共 53456 个独特的 Dockerfile 中评估了 Dockerfile 异味的存活率，并对大量去除异味的提交进行了人工验证，以了解 (i) 开发人员是否出于去除不良做法的目的进行了更改，以及 (ii) 他们是否意识到了所去除的异味。第二部分，我们使用基于规则的工具自动修复 Dockerfile 中的气味。然后，我们通过拉取请求向开发人员提出修复建议。最后，在超过 7 个月的监控期后，我们对结果进行了定量和定性评估。我们的研究结果表明，大多数开发人员更关注旨在提高 Dockerfile 性能（镜像大小和构建时间）的变更。此外，他们愿意接受对最常见问题的修复，但也有一些例外（如操作系统软件包缺少版本固定）。

{"title":"Fixing Dockerfile smells: an empirical study","authors":"Giovanni Rosa, Federico Zappone, Simone Scalabrino, Rocco Oliveto","doi":"10.1007/s10664-024-10471-7","DOIUrl":"https://doi.org/10.1007/s10664-024-10471-7","url":null,"abstract":"Docker is the de facto standard for software containerization. A Dockerfile contains the requirements to build a Docker image containing a target application. There are several best practice rules for writing Dockerfiles, but the developers do not always follow them. Violations of such practices, known as Dockerfile smells, can negatively impact the reliability and performance of Docker images. Previous studies showed that Dockerfile smells are widely diffused, and there is a lack of automatic tools that support developers in fixing them. However, it is still unclear what Dockerfile smells get fixed by developers and to what extent developers would be willing to fix smells in the first place. The aim of our study is twofold. First, we want to understand what Dockerfiles smells receive more attention from developers, i.e., are fixed more frequently in the history of open-source projects. Second, we want to check if developers are willing to accept changes aimed at fixing Dockerfile smells (e.g., generated by an automated tool), to understand if they care about them. We evaluated the survivability of Dockerfile smells from a total of 53,456 unique Dockerfiles, where we manually validated a large sample of smell-removing commits to understand (i) if developers performed the change with the intention of removing bad practices, and (ii) if they were aware of the removed smell. In the second part, we used a rule-based tool to automatically fix Dockerfile smells. Then, we proposed such fixes to developers via pull requests. Finally, we quantitatively and qualitatively evaluated the outcome after a monitoring period of more than 7 months. The results of our study showed that most developers pay more attention to changes aimed at improving the performance of Dockerfiles (image size and build time). Moreover, they are willing to accept the fixes for the most common smells, with some exceptions (e.g., missing version pinning for OS packages).","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"18 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141567176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Design smells in multi-language systems and bug-proneness: a survival analysis 多语言系统中的设计气味与错误倾向：生存分析

IF 4.1 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Empirical Software Engineering

Pub Date : 2024-07-03 DOI: 10.1007/s10664-024-10476-2

Mouna Abidi, Md Saidur Rahman, Moses Openja, Foutse Khomh

Modern applications are often developed using a combination of programming languages and technologies. Multi-language systems offer opportunities for code reuse and the possibility to leverage the strengths of multiple programming languages. However, multi-language development may also impede code comprehension and increase maintenance overhead. As a result of this, developers may introduce design smells by making poor design and implementation choices. Studies on mono-language systems suggest that design smells may increase the risk of bugs and negatively impact software quality. However, the impacts of multi-language smells on software quality are still under-investigated. In this paper, we aim to examine the impacts of multi-language smells on software quality, bug-proneness in particular. We performed survival analysis comparing the time until a bug occurrence in files with and without multi-language design smells. To have qualitative insights into the impacts of multi-language design smells on software bug-proneness, we performed topic modeling and manual investigations, to capture the categories and characteristics of bugs that frequently occur in files with multi-language smells. Our investigation shows that (1) files with multi-language smells experience bugs faster than files without those smells, and non-smelly files have hazard rates 87.5% lower than files with smells, (2) files with some specific types of smells experience bugs faster than the other smells, and (3) bugs related to “programming errors”, “external libraries and features support issues”, and “memory issues” are the most dominant types of bugs that occur in files with multi-language smells. Through this study, we aim to raise the awareness of developers about the impacts of multi-language design smells, and help them prioritize maintenance activities.

现代应用程序通常使用多种编程语言和技术组合开发。多语言系统为代码重用提供了机会，也为利用多种编程语言的优势提供了可能。然而，多语言开发也可能妨碍代码理解，增加维护开销。因此，开发人员可能会在设计和实施中做出错误的选择，从而引入设计气味。有关单语言系统的研究表明，设计气味可能会增加出现错误的风险，并对软件质量产生负面影响。然而，多语言气味对软件质量的影响仍未得到充分研究。在本文中，我们旨在研究多语言气味对软件质量的影响，尤其是对错误发生率的影响。我们进行了生存分析，比较了有多语言设计气味和没有多语言设计气味的文件中发生错误的时间。为了定性地了解多语言设计气味对软件错误发生率的影响，我们进行了主题建模和人工调查，以捕捉在有多语言气味的文件中经常出现的错误的类别和特征。调查结果表明：(1) 具有多语言设计气味的文件比不具有多语言设计气味的文件更容易出现错误，而不具有多语言设计气味的文件比具有多语言设计气味的文件的错误率低 87.5%；(2) 具有某些特定类型气味的文件比具有其他气味的文件更容易出现错误；(3) 与 "编程错误"、"外部库和功能支持问题 "和 "内存问题 "相关的错误是具有多语言设计气味的文件中最主要的错误类型。通过这项研究，我们希望提高开发人员对多语言设计气味影响的认识，并帮助他们确定维护活动的优先次序。

{"title":"Design smells in multi-language systems and bug-proneness: a survival analysis","authors":"Mouna Abidi, Md Saidur Rahman, Moses Openja, Foutse Khomh","doi":"10.1007/s10664-024-10476-2","DOIUrl":"https://doi.org/10.1007/s10664-024-10476-2","url":null,"abstract":"Modern applications are often developed using a combination of programming languages and technologies. Multi-language systems offer opportunities for code reuse and the possibility to leverage the strengths of multiple programming languages. However, multi-language development may also impede code comprehension and increase maintenance overhead. As a result of this, developers may introduce design smells by making poor design and implementation choices. Studies on mono-language systems suggest that design smells may increase the risk of bugs and negatively impact software quality. However, the impacts of multi-language smells on software quality are still under-investigated. In this paper, we aim to examine the impacts of multi-language smells on software quality, bug-proneness in particular. We performed survival analysis comparing the time until a bug occurrence in files with and without multi-language design smells. To have qualitative insights into the impacts of multi-language design smells on software bug-proneness, we performed topic modeling and manual investigations, to capture the categories and characteristics of bugs that frequently occur in files with multi-language smells. Our investigation shows that (1) files with multi-language smells experience bugs faster than files without those smells, and non-smelly files have hazard rates 87.5% lower than files with smells, (2) files with some specific types of smells experience bugs faster than the other smells, and (3) bugs related to “programming errors”, “external libraries and features support issues”, and “memory issues” are the most dominant types of bugs that occur in files with multi-language smells. Through this study, we aim to raise the awareness of developers about the impacts of multi-language design smells, and help them prioritize maintenance activities.","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"17 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141550952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

What causes exceptions in machine learning applications? Mining machine learning-related stack traces on Stack Overflow 什么原因导致机器学习应用程序出现异常？在 Stack Overflow 上挖掘机器学习相关的堆栈跟踪

IF 4.1 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Empirical Software Engineering

Pub Date : 2024-07-03 DOI: 10.1007/s10664-024-10499-9

Amin Ghadesi, Maxime Lamothe, Heng Li

Machine learning (ML), including deep learning, has recently gained tremendous popularity in a wide range of applications. However, like traditional software, ML applications are not immune to the bugs that result from programming errors. Explicit programming errors usually manifest through error messages and stack traces. These stack traces describe the chain of function calls that lead to an anomalous situation, or exception. Indeed, these exceptions may cross the entire software stack (including applications and libraries). Thus, studying the ML-related patterns in stack traces can help practitioners and researchers understand the causes of exceptions in ML applications and the challenges faced by ML developers. To that end, we mine Stack Overflow (SO) and study 18, 538 ML-related stack traces related to seven popular Python ML libraries. First, we observe that ML questions that contain stack traces are less likely to get accepted answers than questions that don’t, even though they gain more attention (i.e., more views and comments). Second, we observe that recurrent patterns exist in ML stack traces, even across different ML libraries, with a small portion of patterns covering many stack traces. Third, we derive five high-level categories and 26 low-level types from the stack trace patterns: most patterns are related to model training, python basic syntax, parallelization, subprocess invocation, and external module execution. Furthermore, the patterns related to external dependencies (e.g., file operations) or manipulations of artifacts (e.g., model conversion) are among the least likely to get accepted answers on SO. Our findings provide insights for researchers, ML library developers, and technical forum moderators to better support ML developers in writing error-free ML code. For example, future research can leverage the common patterns of stack traces to help ML developers locate solutions to problems similar to theirs or to identify experts who have experience solving similar patterns of problems. Researchers and ML library developers could prioritize efforts to help ML developers identify misuses of ML APIs, mismatches in data formats, and potential data/resource contentions so that ML developers can better avoid/fix model-related exception patterns, data-related exception patterns, and multi-process-related exception patterns, respectively.

机器学习（ML），包括深度学习，最近在各种应用中大受欢迎。然而，与传统软件一样，ML 应用程序也无法避免编程错误导致的错误。明确的编程错误通常通过错误信息和堆栈跟踪来表现。这些堆栈跟踪描述了导致异常情况或异常的函数调用链。事实上，这些异常可能跨越整个软件堆栈（包括应用程序和库）。因此，研究堆栈跟踪中与 ML 相关的模式有助于从业人员和研究人员了解 ML 应用程序中出现异常的原因以及 ML 开发人员面临的挑战。为此，我们挖掘了 Stack Overflow (SO)，研究了与七个流行 Python ML 库相关的 18,538 个 ML 相关堆栈跟踪。首先，我们观察到，与不包含堆栈跟踪的问题相比，包含堆栈跟踪的 ML 问题不太可能获得被接受的答案，尽管它们获得了更多的关注（即更多的浏览和评论）。其次，我们观察到 ML 堆栈跟踪中存在重复出现的模式，即使在不同的 ML 库中也是如此，其中一小部分模式涵盖了许多堆栈跟踪。第三，我们从堆栈跟踪模式中得出了 5 个高级类别和 26 个低级类型：大多数模式与模型训练、python 基本语法、并行化、子进程调用和外部模块执行有关。此外，与外部依赖性（如文件操作）或工件操作（如模型转换）相关的模式是最不可能在 SO 上得到接受答案的模式。我们的研究结果为研究人员、ML 库开发人员和技术论坛版主提供了见解，以便更好地支持 ML 开发人员编写无差错的 ML 代码。例如，未来的研究可以利用堆栈跟踪的常见模式来帮助 ML 开发人员找到与他们的问题类似的解决方案，或者找出在解决类似问题模式方面有经验的专家。研究人员和 ML 库开发人员可以优先帮助 ML 开发人员识别 ML 应用程序接口的误用、数据格式的不匹配以及潜在的数据/资源争议，以便 ML 开发人员可以更好地避免/修复模型相关异常模式、数据相关异常模式以及多进程相关异常模式。

{"title":"What causes exceptions in machine learning applications? Mining machine learning-related stack traces on Stack Overflow","authors":"Amin Ghadesi, Maxime Lamothe, Heng Li","doi":"10.1007/s10664-024-10499-9","DOIUrl":"https://doi.org/10.1007/s10664-024-10499-9","url":null,"abstract":"Machine learning (ML), including deep learning, has recently gained tremendous popularity in a wide range of applications. However, like traditional software, ML applications are not immune to the bugs that result from programming errors. Explicit programming errors usually manifest through error messages and stack traces. These stack traces describe the chain of function calls that lead to an anomalous situation, or exception. Indeed, these exceptions may cross the entire software stack (including applications and libraries). Thus, studying the ML-related patterns in stack traces can help practitioners and researchers understand the causes of exceptions in ML applications and the challenges faced by ML developers. To that end, we mine Stack Overflow (SO) and study 18, 538 ML-related stack traces related to seven popular Python ML libraries. First, we observe that ML questions that contain stack traces are less likely to get accepted answers than questions that don’t, even though they gain more attention (i.e., more views and comments). Second, we observe that recurrent patterns exist in ML stack traces, even across different ML libraries, with a small portion of patterns covering many stack traces. Third, we derive five high-level categories and 26 low-level types from the stack trace patterns: most patterns are related to model training, python basic syntax, parallelization, subprocess invocation, and external module execution. Furthermore, the patterns related to external dependencies (e.g., file operations) or manipulations of artifacts (e.g., model conversion) are among the least likely to get accepted answers on SO. Our findings provide insights for researchers, ML library developers, and technical forum moderators to better support ML developers in writing error-free ML code. For example, future research can leverage the common patterns of stack traces to help ML developers locate solutions to problems similar to theirs or to identify experts who have experience solving similar patterns of problems. Researchers and ML library developers could prioritize efforts to help ML developers identify misuses of ML APIs, mismatches in data formats, and potential data/resource contentions so that ML developers can better avoid/fix model-related exception patterns, data-related exception patterns, and multi-process-related exception patterns, respectively.","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"1 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141550953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Systematic Evaluation of Deep Learning Models for Log-based Failure Prediction 对基于日志的故障预测深度学习模型进行系统评估

IF 4.1 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Empirical Software Engineering

Pub Date : 2024-06-20 DOI: 10.1007/s10664-024-10501-4

Fatemeh Hadadi, Joshua H. Dawes, Donghwan Shin, Domenico Bianculli, Lionel Briand

With the increasing complexity and scope of software systems, their dependability is crucial. The analysis of log data recorded during system execution can enable engineers to automatically predict failures at run time. Several Machine Learning (ML) techniques, including traditional ML and Deep Learning (DL), have been proposed to automate such tasks. However, current empirical studies are limited in terms of covering all main DL types—Recurrent Neural Network (RNN), Convolutional Neural Network (CNN), and transformer—as well as examining them on a wide range of diverse datasets. In this paper, we aim to address these issues by systematically investigating the combination of log data embedding strategies and DL types for failure prediction. To that end, we propose a modular architecture to accommodate various configurations of embedding strategies and DL-based encoders. To further investigate how dataset characteristics such as dataset size and failure percentage affect model accuracy, we synthesised 360 datasets, with varying characteristics, for three distinct system behavioural models, based on a systematic and automated generation approach. Using the F1 score metric, our results show that the best overall performing configuration is a CNN-based encoder with Logkey2vec. Additionally, we provide specific dataset conditions, namely a dataset size (>350) or a failure percentage (>7.5%), under which this configuration demonstrates high accuracy for failure prediction.

随着软件系统的复杂性和范围不断增加，其可靠性至关重要。通过分析系统执行过程中记录的日志数据，工程师可以在运行时自动预测故障。目前已经提出了几种机器学习（ML）技术，包括传统的 ML 和深度学习（DL），用于自动完成此类任务。然而，目前的实证研究在涵盖所有主要的深度学习类型--递归神经网络（RNN）、卷积神经网络（CNN）和变压器--以及在各种不同的数据集上对它们进行检验方面都很有限。在本文中，我们旨在通过系统地研究日志数据嵌入策略和故障预测 DL 类型的组合来解决这些问题。为此，我们提出了一种模块化架构，以适应嵌入策略和基于 DL 的编码器的各种配置。为了进一步研究数据集的特征（如数据集大小和故障百分比）对模型准确性的影响，我们基于系统化的自动生成方法，为三种不同的系统行为模型合成了 360 个特征各异的数据集。使用 F1 分数指标，我们的结果表明，整体性能最佳的配置是基于 CNN 的编码器和 Logkey2vec。此外，我们还提供了特定的数据集条件，即数据集大小（350）或故障百分比（7.5%），在这些条件下，该配置的故障预测准确率很高。

{"title":"Systematic Evaluation of Deep Learning Models for Log-based Failure Prediction","authors":"Fatemeh Hadadi, Joshua H. Dawes, Donghwan Shin, Domenico Bianculli, Lionel Briand","doi":"10.1007/s10664-024-10501-4","DOIUrl":"https://doi.org/10.1007/s10664-024-10501-4","url":null,"abstract":"With the increasing complexity and scope of software systems, their dependability is crucial. The analysis of log data recorded during system execution can enable engineers to automatically predict failures at run time. Several Machine Learning (ML) techniques, including traditional ML and Deep Learning (DL), have been proposed to automate such tasks. However, current empirical studies are limited in terms of covering all main DL types—Recurrent Neural Network (RNN), Convolutional Neural Network (CNN), and transformer—as well as examining them on a wide range of diverse datasets. In this paper, we aim to address these issues by systematically investigating the combination of log data embedding strategies and DL types for failure prediction. To that end, we propose a modular architecture to accommodate various configurations of embedding strategies and DL-based encoders. To further investigate how dataset characteristics such as dataset size and failure percentage affect model accuracy, we synthesised 360 datasets, with varying characteristics, for three distinct system behavioural models, based on a systematic and automated generation approach. Using the F1 score metric, our results show that the best overall performing configuration is a CNN-based encoder with Logkey2vec. Additionally, we provide specific dataset conditions, namely a dataset size (>350) or a failure percentage (>7.5%), under which this configuration demonstrates high accuracy for failure prediction.","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"16 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A comprehensive analysis of challenges and strategies for software release notes on GitHub 全面分析 GitHub 上软件发布说明的挑战和策略

IF 4.1 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Empirical Software Engineering

Pub Date : 2024-06-20 DOI: 10.1007/s10664-024-10486-0

Jianyu Wu, Hao He, Kai Gao, Wenxin Xiao, Jingyue Li, Minghui Zhou

Release notes (RNs) refer to the technical documentation that offers users, developers, and other stakeholders comprehensive information about the changes and updates of a new software version. Producing high-quality RNs can be challenging, and it remains unknown what issues developers commonly encounter and what effective strategies can be adopted to mitigate them. To bridge this knowledge gap, we conduct a manual analysis of 1,529 latest RN-related issues in the GitHub issue tracker by using multiple rounds of open coding and construct 1) a comprehensive taxonomy of RN-related issues with four dimensions validated through three semi-structured interviews; 2) an effective framework with eight categories of strategies to overcome these challenges. The four dimensions of RN-related issues revealed by the taxonomy and the corresponding strategies from the framework include: 1) Content (419, 25.47%): RN producers tend to overlook information rather than include inaccurate details, especially for breaking changes. To address this, effective completeness validations are recommended, such as managing Pull Requests, issues, and commits related to RNs; 2) Presentation (150, 9.12%): inadequate layout may bury important information and lead to end users’ confusion, which can be mitigated by employing a hierarchical structure, standardized format, rendering RNs, and folding techniques; 3) Accessibility (303, 18.42%): many users find RNs inaccessible due to link deterioration, insufficient notification, and obfuscated RN locations. This can be alleviated by adopting appropriate locations and channels (such as project websites) and standardizing link management.; 4) Production (773, 46.99%): despite the high demand from RN producers, automating and standardizing the RN production process remains challenging. Developers resolve this problem by using some mature tools on GitHub (like Release Drafter). Additionally, offering guidance, clarifying responsibilities, and distributing workloads are effective in improving collaboration within the team. Mechanisms for distributing and verifying RNs are also selected to enhance synchronization management. Our taxonomy provides a comprehensive blueprint to improve RN production in practice and also reveals interesting future research directions.

发布说明（RN）是指为用户、开发人员和其他利益相关者提供有关新软件版本变更和更新的全面信息的技术文档。制作高质量的 RNs 可能具有挑战性，开发人员通常会遇到哪些问题，可以采取哪些有效策略来缓解这些问题，目前仍是未知数。为了弥补这一知识空白，我们通过多轮开放式编码，对 GitHub 问题跟踪器中 1529 个最新的 RN 相关问题进行了人工分析，并构建了 1) 一套全面的 RN 相关问题分类法，其中包含通过三次半结构化访谈验证的四个维度；2) 一套有效的框架，其中包含八类克服这些挑战的策略。分类法揭示的与护士相关问题的四个方面以及框架中的相应策略包括1) 内容（419，25.47%）：RN 制作者倾向于忽略信息而不是包含不准确的细节，尤其是对于突发性变化。为解决这一问题，建议进行有效的完整性验证，如管理与 RN 相关的 Pull Requests、issues 和 commits；2）呈现（150，9.12%）：不适当的布局可能会掩盖重要信息并导致最终用户混淆，可通过采用分层结构、标准化格式、渲染 RN 和折叠技术来缓解这一问题；3）可访问性（303，18.42%）：许多用户发现，由于链接劣化、通知不足和 RN 位置模糊，RN 无法访问。通过采用适当的位置和渠道（如项目网站）以及标准化链接管理，可以缓解这一问题；4）生产（773，46.99%）：尽管 RN 生产者的需求量很大，但 RN 生产流程的自动化和标准化仍具有挑战性。开发人员通过使用 GitHub 上的一些成熟工具（如 Release Drafter）来解决这一问题。此外，提供指导、明确责任和分配工作量也能有效改善团队内部的协作。我们还选择了分发和验证 RN 的机制，以加强同步管理。我们的分类法为在实践中改进 RN 生产提供了一个全面的蓝图，同时也揭示了有趣的未来研究方向。

{"title":"A comprehensive analysis of challenges and strategies for software release notes on GitHub","authors":"Jianyu Wu, Hao He, Kai Gao, Wenxin Xiao, Jingyue Li, Minghui Zhou","doi":"10.1007/s10664-024-10486-0","DOIUrl":"https://doi.org/10.1007/s10664-024-10486-0","url":null,"abstract":"Release notes (RNs) refer to the technical documentation that offers users, developers, and other stakeholders comprehensive information about the changes and updates of a new software version. Producing high-quality RNs can be challenging, and it remains unknown what issues developers commonly encounter and what effective strategies can be adopted to mitigate them. To bridge this knowledge gap, we conduct a manual analysis of 1,529 latest RN-related issues in the GitHub issue tracker by using multiple rounds of open coding and construct 1) a comprehensive taxonomy of RN-related issues with four dimensions validated through three semi-structured interviews; 2) an effective framework with eight categories of strategies to overcome these challenges. The four dimensions of RN-related issues revealed by the taxonomy and the corresponding strategies from the framework include: 1) Content (419, 25.47%): RN producers tend to overlook information rather than include inaccurate details, especially for breaking changes. To address this, effective completeness validations are recommended, such as managing Pull Requests, issues, and commits related to RNs; 2) Presentation (150, 9.12%): inadequate layout may bury important information and lead to end users’ confusion, which can be mitigated by employing a hierarchical structure, standardized format, rendering RNs, and folding techniques; 3) Accessibility (303, 18.42%): many users find RNs inaccessible due to link deterioration, insufficient notification, and obfuscated RN locations. This can be alleviated by adopting appropriate locations and channels (such as project websites) and standardizing link management.; 4) Production (773, 46.99%): despite the high demand from RN producers, automating and standardizing the RN production process remains challenging. Developers resolve this problem by using some mature tools on GitHub (like Release Drafter). Additionally, offering guidance, clarifying responsibilities, and distributing workloads are effective in improving collaboration within the team. Mechanisms for distributing and verifying RNs are also selected to enhance synchronization management. Our taxonomy provides a comprehensive blueprint to improve RN production in practice and also reveals interesting future research directions.","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"60 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A literature review and existing challenges on software logging practices 关于软件日志记录实践的文献综述和现有挑战

IF 4.1 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Empirical Software Engineering

Pub Date : 2024-06-18 DOI: 10.1007/s10664-024-10452-w

Mohamed Amine Batoun, Mohammed Sayagh, Roozbeh Aghili, Ali Ouni, Heng Li

Software logging is the practice of recording different events and activities that occur within a software system, which are useful for different activities such as failure prediction and anomaly detection. While previous research focused on improving different aspects of logging practices, the goal of this paper is to conduct a systematic literature review and the existing challenges of practitioners in software logging practices. In this paper, we focus on the logging practices that cover the steps from the instrumentation of a software system, the storage of logs, up to the preprocessing steps that prepare log data for further follow-up analysis. Our systematic literature review (SLR) covers 204 research papers and a quantitative and qualitative analysis of 20,766 and 149 questions on StackOverflow (SO). We observe that 53% of the studies focus on improving the techniques that preprocess logs for analysis (e.g., the practices of log parsing, log clustering and log mining), 37% focus on how to create new log statements, and 10% focus on how to improve log file storage. Our analysis of SO topics reveals that five out of seven identified high-level topics are not covered by the literature and are related to dependency configuration of logging libraries, infrastructure related configuration, scattered logging, context-dependant usage of logs and handling log files.

软件日志是记录软件系统中发生的不同事件和活动的实践，对故障预测和异常检测等不同活动非常有用。以往的研究侧重于改进日志记录实践的不同方面，而本文的目标则是进行系统的文献综述，以及从业人员在软件日志记录实践中面临的现有挑战。在本文中，我们将重点关注日志记录实践，包括从软件系统的仪器安装、日志存储到为进一步后续分析准备日志数据的预处理步骤。我们的系统文献综述（SLR）涵盖了 204 篇研究论文，以及对 StackOverflow（SO）上的 20,766 和 149 个问题进行的定量和定性分析。我们发现，53% 的研究侧重于改进预处理日志以供分析的技术（如日志解析、日志聚类和日志挖掘等实践），37% 的研究侧重于如何创建新的日志语句，10% 的研究侧重于如何改进日志文件存储。我们对 SO 主题的分析表明，在确定的 7 个高级主题中，有 5 个是文献中没有涉及的，它们与日志库的依赖性配置、基础设施相关配置、分散日志、日志的上下文依赖性使用和日志文件处理有关。

{"title":"A literature review and existing challenges on software logging practices","authors":"Mohamed Amine Batoun, Mohammed Sayagh, Roozbeh Aghili, Ali Ouni, Heng Li","doi":"10.1007/s10664-024-10452-w","DOIUrl":"https://doi.org/10.1007/s10664-024-10452-w","url":null,"abstract":"Software logging is the practice of recording different events and activities that occur within a software system, which are useful for different activities such as failure prediction and anomaly detection. While previous research focused on improving different aspects of logging practices, the goal of this paper is to conduct a systematic literature review and the existing challenges of practitioners in software logging practices. In this paper, we focus on the logging practices that cover the steps from the instrumentation of a software system, the storage of logs, up to the preprocessing steps that prepare log data for further follow-up analysis. Our systematic literature review (SLR) covers 204 research papers and a quantitative and qualitative analysis of 20,766 and 149 questions on StackOverflow (SO). We observe that 53% of the studies focus on improving the techniques that preprocess logs for analysis (e.g., the practices of log parsing, log clustering and log mining), 37% focus on how to create new log statements, and 10% focus on how to improve log file storage. Our analysis of SO topics reveals that five out of seven identified high-level topics are not covered by the literature and are related to dependency configuration of logging libraries, infrastructure related configuration, scattered logging, context-dependant usage of logs and handling log files.","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"158 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Engineering recommender systems for modelling languages: concept, tool and evaluation 建模语言工程推荐系统：概念、工具和评估

IF 4.1 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Empirical Software Engineering

Pub Date : 2024-06-18 DOI: 10.1007/s10664-024-10483-3

Lissette Almonte, Esther Guerra, Iván Cantador, Juan de Lara

Recommender systems (RSs) are ubiquitous in all sorts of online applications, in areas like shopping, media broadcasting, travel and tourism, among many others. They are also common to help in software engineering tasks, including software modelling, where we are recently witnessing proposals to enrich modelling languages and environments with RSs. Modelling recommenders assist users in building models by suggesting items based on previous solutions to similar problems in the same domain. However, building a RS for a modelling language requires considerable effort and specialised knowledge. To alleviate this problem, we propose an automated, model-driven approach to create RSs for modelling languages. The approach provides a domain-specific language called Droid to configure every aspect of the RS: the type of the recommended modelling elements, the gathering and preprocessing of training data, the recommendation method, and the metrics used to evaluate the created RS. The RS so configured can be deployed as a service, and we offer out-of-the-box integration with Eclipse modelling editors. Moreover, the language is extensible with new data sources and recommendation methods. To assess the usefulness of our proposal, we report on two evaluations. The first one is an offline experiment measuring the precision, completeness and diversity of recommendations generated by several methods. The second is a user study – with 40 participants – to assess the perceived quality of the recommendations. The study also contributes with a novel evaluation methodology and metrics for RSs in model-driven engineering.

推荐系统（RS）在购物、媒体广播、旅行和旅游等领域的各种在线应用中无处不在。在软件工程任务（包括软件建模）中，它们也是常见的辅助工具，最近，我们看到了用 RS 丰富建模语言和环境的建议。建模推荐器根据以前对同一领域类似问题的解决方法推荐项目，从而帮助用户建立模型。然而，为建模语言建立 RS 需要大量的努力和专业知识。为了缓解这一问题，我们提出了一种自动化、模型驱动的方法来为建模语言创建 RS。该方法提供了一种名为 Droid 的特定领域语言，用于配置 RS 的各个方面：推荐建模元素的类型、训练数据的收集和预处理、推荐方法以及用于评估所创建 RS 的指标。这样配置的 RS 可以作为一项服务进行部署，我们还提供了与 Eclipse 建模编辑器的开箱即用集成。此外，该语言还可通过新的数据源和推荐方法进行扩展。为了评估我们建议的实用性，我们报告了两项评估。第一项是离线实验，测量几种方法生成的推荐的精确度、完整性和多样性。第二项是一项用户研究--有 40 人参与--以评估推荐的感知质量。这项研究还为模型驱动工程中的 RS 提供了一种新的评估方法和衡量标准。

{"title":"Engineering recommender systems for modelling languages: concept, tool and evaluation","authors":"Lissette Almonte, Esther Guerra, Iván Cantador, Juan de Lara","doi":"10.1007/s10664-024-10483-3","DOIUrl":"https://doi.org/10.1007/s10664-024-10483-3","url":null,"abstract":"Recommender systems (RSs) are ubiquitous in all sorts of online applications, in areas like shopping, media broadcasting, travel and tourism, among many others. They are also common to help in software engineering tasks, including software modelling, where we are recently witnessing proposals to enrich modelling languages and environments with RSs. Modelling recommenders assist users in building models by suggesting items based on previous solutions to similar problems in the same domain. However, building a RS for a modelling language requires considerable effort and specialised knowledge. To alleviate this problem, we propose an automated, model-driven approach to create RSs for modelling languages. The approach provides a domain-specific language called Droid to configure every aspect of the RS: the type of the recommended modelling elements, the gathering and preprocessing of training data, the recommendation method, and the metrics used to evaluate the created RS. The RS so configured can be deployed as a service, and we offer out-of-the-box integration with Eclipse modelling editors. Moreover, the language is extensible with new data sources and recommendation methods. To assess the usefulness of our proposal, we report on two evaluations. The first one is an offline experiment measuring the precision, completeness and diversity of recommendations generated by several methods. The second is a user study – with 40 participants – to assess the perceived quality of the recommendations. The study also contributes with a novel evaluation methodology and metrics for RSs in model-driven engineering.","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"345 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

OpenSCV: an open hierarchical taxonomy for smart contract vulnerabilities OpenSCV：针对智能合约漏洞的开放式分层分类法

IF 4.1 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Empirical Software Engineering

Pub Date : 2024-06-18 DOI: 10.1007/s10664-024-10446-8

Fernando Richter Vidal, Naghmeh Ivaki, Nuno Laranjeiro

Smart contracts are nowadays at the core of most blockchain systems. Like all computer programs, smart contracts are subject to the presence of residual faults, including severe security vulnerabilities. However, the key distinction lies in how these vulnerabilities are addressed. In smart contracts, when a vulnerability is identified, the affected contract must be terminated within the blockchain, as due to the immutable nature of blockchains, it is impossible to patch a contract once deployed. In this context, research efforts have been focused on proactively preventing the deployment of smart contracts containing vulnerabilities, mainly through the development of vulnerability detection tools. Along with these efforts, several heterogeneous vulnerability classification schemes appeared (e.g., most notably DASP and SWC). At the time of writing, these are mostly outdated initiatives, even though new smart contract vulnerabilities are consistently uncovered. In this paper, we propose OpenSCV, a new and Open hierarchical taxonomy for Smart Contract vulnerabilities, which is open to community contributions and matches the current state of the practice while being prepared to handle future modifications and evolution. The taxonomy was built based on the analysis of the existing research on vulnerability classification, community-maintained classification schemes, and research on smart contract vulnerability detection. We show how OpenSCV covers the announced detection ability of the current vulnerability detection tools and highlight its usefulness in smart contract vulnerability research. To validate OpenSCV, we performed an expert-based analysis wherein we invited multiple experts engaged in smart contract security research to participate in a questionnaire. The feedback from these experts indicated that the categories in OpenSCV are representative, clear, easily understandable, comprehensive, and highly useful. Regarding the vulnerabilities, the experts confirmed that they are easily understandable.

如今，智能合约是大多数区块链系统的核心。与所有计算机程序一样，智能合约也存在残余故障，包括严重的安全漏洞。然而，关键区别在于如何解决这些漏洞。在智能合约中，一旦发现漏洞，受影响的合约必须在区块链中终止，因为区块链具有不可更改的性质，一旦部署，就不可能对合约进行修补。在这种情况下，研究工作主要集中在通过开发漏洞检测工具，主动防止部署含有漏洞的智能合约。伴随着这些努力，出现了几种不同的漏洞分类方案（如最著名的 DASP 和 SWC）。在撰写本文时，尽管新的智能合约漏洞不断被发现，但这些方案大多已经过时。在本文中，我们提出了 OpenSCV，这是一种新的、开放的智能合约漏洞分层分类法，它对社区贡献开放，符合当前的实践状况，同时还能应对未来的修改和演变。该分类法是在分析现有漏洞分类研究、社区维护分类方案和智能合约漏洞检测研究的基础上建立的。我们展示了 OpenSCV 如何覆盖当前漏洞检测工具已公布的检测能力，并强调了其在智能合约漏洞研究中的实用性。为了验证 OpenSCV，我们进行了基于专家的分析，邀请多位从事智能合约安全研究的专家参与问卷调查。这些专家的反馈表明，OpenSCV 中的分类具有代表性、清晰、易懂、全面且非常有用。关于漏洞，专家们确认这些漏洞易于理解。

{"title":"OpenSCV: an open hierarchical taxonomy for smart contract vulnerabilities","authors":"Fernando Richter Vidal, Naghmeh Ivaki, Nuno Laranjeiro","doi":"10.1007/s10664-024-10446-8","DOIUrl":"https://doi.org/10.1007/s10664-024-10446-8","url":null,"abstract":"Smart contracts are nowadays at the core of most blockchain systems. Like all computer programs, smart contracts are subject to the presence of residual faults, including severe security vulnerabilities. However, the key distinction lies in how these vulnerabilities are addressed. In smart contracts, when a vulnerability is identified, the affected contract must be terminated within the blockchain, as due to the immutable nature of blockchains, it is impossible to patch a contract once deployed. In this context, research efforts have been focused on proactively preventing the deployment of smart contracts containing vulnerabilities, mainly through the development of vulnerability detection tools. Along with these efforts, several heterogeneous vulnerability classification schemes appeared (e.g., most notably DASP and SWC). At the time of writing, these are mostly outdated initiatives, even though new smart contract vulnerabilities are consistently uncovered. In this paper, we propose OpenSCV, a new and Open hierarchical taxonomy for Smart Contract vulnerabilities, which is open to community contributions and matches the current state of the practice while being prepared to handle future modifications and evolution. The taxonomy was built based on the analysis of the existing research on vulnerability classification, community-maintained classification schemes, and research on smart contract vulnerability detection. We show how OpenSCV covers the announced detection ability of the current vulnerability detection tools and highlight its usefulness in smart contract vulnerability research. To validate OpenSCV, we performed an expert-based analysis wherein we invited multiple experts engaged in smart contract security research to participate in a questionnaire. The feedback from these experts indicated that the categories in OpenSCV are representative, clear, easily understandable, comprehensive, and highly useful. Regarding the vulnerabilities, the experts confirmed that they are easily understandable.","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"174 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An empirical study of challenges in machine learning asset management 机器学习资产管理挑战的实证研究

IF 4.1 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Empirical Software Engineering

Pub Date : 2024-06-15 DOI: 10.1007/s10664-024-10474-4

Zhimin Zhao, Yihao Chen, Abdul Ali Bangash, Bram Adams, Ahmed E. Hassan

Context:

In machine learning (ML) applications, assets include not only the ML models themselves, but also the datasets, algorithms, and deployment tools that are essential in the development, training, and implementation of these models. Efficient management of ML assets is critical to ensure optimal resource utilization, consistent model performance, and a streamlined ML development lifecycle. This practice contributes to faster iterations, adaptability, reduced time from model development to deployment, and the delivery of reliable and timely outputs.

Objective:

Despite research on ML asset management, there is still a significant knowledge gap on operational challenges, such as model versioning, data traceability, and collaboration issues, faced by asset management tool users. These challenges are crucial because they could directly impact the efficiency, reproducibility, and overall success of machine learning projects. Our study aims to bridge this empirical gap by analyzing user experience, feedback, and needs from Q &A posts, shedding light on the real-world challenges they face and the solutions they have found.

Method:

We examine 15, 065 Q &A posts from multiple developer discussion platforms, including Stack Overflow, tool-specific forums, and GitHub/GitLab. Using a mixed-method approach, we classify the posts into knowledge inquiries and problem inquiries. We then apply BERTopic to extract challenge topics and compare their prevalence. Finally, we use the open card sorting approach to summarize solutions from solved inquiries, then cluster them with BERTopic, and analyze the relationship between challenges and solutions.

Results:

We identify 133 distinct topics in ML asset management-related inquiries, grouped into 16 macro-topics, with software environment and dependency, model deployment and service, and model creation and training emerging as the most discussed. Additionally, we identify 79 distinct solution topics, classified under 18 macro-topics, with software environment and dependency, feature and component development, and file and directory management as the most proposed.

Conclusions:

This study highlights critical areas within ML asset management that need further exploration, particularly around prevalent macro-topics identified as pain points for ML practitioners, emphasizing the need for collaborative efforts between academia, industry, and the broader research community.

背景：在机器学习（ML）应用中，资产不仅包括 ML 模型本身，还包括对这些模型的开发、训练和实施至关重要的数据集、算法和部署工具。高效管理 ML 资产对于确保最佳资源利用、一致的模型性能和简化 ML 开发生命周期至关重要。这种做法有助于加快迭代速度、提高适应性、缩短从模型开发到部署的时间，并及时交付可靠的产出。目标：尽管对 ML 资产管理进行了研究，但在资产管理工具用户所面临的操作挑战（如模型版本、数据可追溯性和协作问题）方面，仍然存在巨大的知识差距。这些挑战至关重要，因为它们会直接影响机器学习项目的效率、可重复性和整体成功。我们的研究旨在通过分析 Q &A 帖子中的用户体验、反馈和需求，揭示他们在现实世界中面临的挑战和找到的解决方案，从而弥补这一经验上的不足。方法：我们研究了来自多个开发人员讨论平台（包括 Stack Overflow、特定工具论坛和 GitHub/GitLab）的 15 065 篇 Q &A 帖子。我们采用混合方法将帖子分为知识咨询和问题咨询。然后，我们应用 BERTopic 提取挑战主题并比较其流行程度。结果：我们在与 ML 资产管理相关的咨询中发现了 133 个不同的主题，并将其归类为 16 个宏观主题，其中软件环境和依赖性、模型部署和服务以及模型创建和训练是讨论最多的主题。结论：本研究强调了智能语言资产管理中需要进一步探索的关键领域，特别是围绕智能语言从业人员痛点的普遍宏观主题，强调了学术界、产业界和更广泛的研究社区之间合作的必要性。

{"title":"An empirical study of challenges in machine learning asset management","authors":"Zhimin Zhao, Yihao Chen, Abdul Ali Bangash, Bram Adams, Ahmed E. Hassan","doi":"10.1007/s10664-024-10474-4","DOIUrl":"https://doi.org/10.1007/s10664-024-10474-4","url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">\u0000Context:\u0000</h3>In machine learning (ML) applications, assets include not only the ML models themselves, but also the datasets, algorithms, and deployment tools that are essential in the development, training, and implementation of these models. Efficient management of ML assets is critical to ensure optimal resource utilization, consistent model performance, and a streamlined ML development lifecycle. This practice contributes to faster iterations, adaptability, reduced time from model development to deployment, and the delivery of reliable and timely outputs.<h3 data-test=\"abstract-sub-heading\">\u0000Objective:\u0000</h3>Despite research on ML asset management, there is still a significant knowledge gap on operational challenges, such as model versioning, data traceability, and collaboration issues, faced by asset management tool users. These challenges are crucial because they could directly impact the efficiency, reproducibility, and overall success of machine learning projects. Our study aims to bridge this empirical gap by analyzing user experience, feedback, and needs from Q &A posts, shedding light on the real-world challenges they face and the solutions they have found.<h3 data-test=\"abstract-sub-heading\">\u0000Method:\u0000</h3>We examine 15, 065 Q &A posts from multiple developer discussion platforms, including Stack Overflow, tool-specific forums, and GitHub/GitLab. Using a mixed-method approach, we classify the posts into knowledge inquiries and problem inquiries. We then apply BERTopic to extract challenge topics and compare their prevalence. Finally, we use the open card sorting approach to summarize solutions from solved inquiries, then cluster them with BERTopic, and analyze the relationship between challenges and solutions.<h3 data-test=\"abstract-sub-heading\">\u0000Results:\u0000</h3>We identify 133 distinct topics in ML asset management-related inquiries, grouped into 16 macro-topics, with software environment and dependency, model deployment and service, and model creation and training emerging as the most discussed. Additionally, we identify 79 distinct solution topics, classified under 18 macro-topics, with software environment and dependency, feature and component development, and file and directory management as the most proposed.<h3 data-test=\"abstract-sub-heading\">\u0000Conclusions:\u0000</h3>This study highlights critical areas within ML asset management that need further exploration, particularly around prevalent macro-topics identified as pain points for ML practitioners, emphasizing the need for collaborative efforts between academia, industry, and the broader research community.","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"13 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0