What causes exceptions in machine learning applications? Mining machine learning-related stack traces on Stack Overflow

IF 3.5 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Empirical Software Engineering Pub Date : 2024-07-03 DOI:10.1007/s10664-024-10499-9

Amin Ghadesi, Maxime Lamothe, Heng Li

{"title":"What causes exceptions in machine learning applications? Mining machine learning-related stack traces on Stack Overflow","authors":"Amin Ghadesi, Maxime Lamothe, Heng Li","doi":"10.1007/s10664-024-10499-9","DOIUrl":null,"url":null,"abstract":"Machine learning (ML), including deep learning, has recently gained tremendous popularity in a wide range of applications. However, like traditional software, ML applications are not immune to the bugs that result from programming errors. Explicit programming errors usually manifest through error messages and stack traces. These stack traces describe the chain of function calls that lead to an anomalous situation, or exception. Indeed, these exceptions may cross the entire software stack (including applications and libraries). Thus, studying the ML-related patterns in stack traces can help practitioners and researchers understand the causes of exceptions in ML applications and the challenges faced by ML developers. To that end, we mine Stack Overflow (SO) and study 18, 538 ML-related stack traces related to seven popular Python ML libraries. First, we observe that ML questions that contain stack traces are less likely to get accepted answers than questions that don’t, even though they gain more attention (i.e., more views and comments). Second, we observe that recurrent patterns exist in ML stack traces, even across different ML libraries, with a small portion of patterns covering many stack traces. Third, we derive five high-level categories and 26 low-level types from the stack trace patterns: most patterns are related to model training, python basic syntax, parallelization, subprocess invocation, and external module execution. Furthermore, the patterns related to external dependencies (e.g., file operations) or manipulations of artifacts (e.g., model conversion) are among the least likely to get accepted answers on SO. Our findings provide insights for researchers, ML library developers, and technical forum moderators to better support ML developers in writing error-free ML code. For example, future research can leverage the common patterns of stack traces to help ML developers locate solutions to problems similar to theirs or to identify experts who have experience solving similar patterns of problems. Researchers and ML library developers could prioritize efforts to help ML developers identify misuses of ML APIs, mismatches in data formats, and potential data/resource contentions so that ML developers can better avoid/fix model-related exception patterns, data-related exception patterns, and multi-process-related exception patterns, respectively.","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"1 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Empirical Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10664-024-10499-9","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Machine learning (ML), including deep learning, has recently gained tremendous popularity in a wide range of applications. However, like traditional software, ML applications are not immune to the bugs that result from programming errors. Explicit programming errors usually manifest through error messages and stack traces. These stack traces describe the chain of function calls that lead to an anomalous situation, or exception. Indeed, these exceptions may cross the entire software stack (including applications and libraries). Thus, studying the ML-related patterns in stack traces can help practitioners and researchers understand the causes of exceptions in ML applications and the challenges faced by ML developers. To that end, we mine Stack Overflow (SO) and study 18, 538 ML-related stack traces related to seven popular Python ML libraries. First, we observe that ML questions that contain stack traces are less likely to get accepted answers than questions that don’t, even though they gain more attention (i.e., more views and comments). Second, we observe that recurrent patterns exist in ML stack traces, even across different ML libraries, with a small portion of patterns covering many stack traces. Third, we derive five high-level categories and 26 low-level types from the stack trace patterns: most patterns are related to model training, python basic syntax, parallelization, subprocess invocation, and external module execution. Furthermore, the patterns related to external dependencies (e.g., file operations) or manipulations of artifacts (e.g., model conversion) are among the least likely to get accepted answers on SO. Our findings provide insights for researchers, ML library developers, and technical forum moderators to better support ML developers in writing error-free ML code. For example, future research can leverage the common patterns of stack traces to help ML developers locate solutions to problems similar to theirs or to identify experts who have experience solving similar patterns of problems. Researchers and ML library developers could prioritize efforts to help ML developers identify misuses of ML APIs, mismatches in data formats, and potential data/resource contentions so that ML developers can better avoid/fix model-related exception patterns, data-related exception patterns, and multi-process-related exception patterns, respectively.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

什么原因导致机器学习应用程序出现异常？在 Stack Overflow 上挖掘机器学习相关的堆栈跟踪

机器学习（ML），包括深度学习，最近在各种应用中大受欢迎。然而，与传统软件一样，ML 应用程序也无法避免编程错误导致的错误。明确的编程错误通常通过错误信息和堆栈跟踪来表现。这些堆栈跟踪描述了导致异常情况或异常的函数调用链。事实上，这些异常可能跨越整个软件堆栈（包括应用程序和库）。因此，研究堆栈跟踪中与 ML 相关的模式有助于从业人员和研究人员了解 ML 应用程序中出现异常的原因以及 ML 开发人员面临的挑战。为此，我们挖掘了 Stack Overflow (SO)，研究了与七个流行 Python ML 库相关的 18,538 个 ML 相关堆栈跟踪。首先，我们观察到，与不包含堆栈跟踪的问题相比，包含堆栈跟踪的 ML 问题不太可能获得被接受的答案，尽管它们获得了更多的关注（即更多的浏览和评论）。其次，我们观察到 ML 堆栈跟踪中存在重复出现的模式，即使在不同的 ML 库中也是如此，其中一小部分模式涵盖了许多堆栈跟踪。第三，我们从堆栈跟踪模式中得出了 5 个高级类别和 26 个低级类型：大多数模式与模型训练、python 基本语法、并行化、子进程调用和外部模块执行有关。此外，与外部依赖性（如文件操作）或工件操作（如模型转换）相关的模式是最不可能在 SO 上得到接受答案的模式。我们的研究结果为研究人员、ML 库开发人员和技术论坛版主提供了见解，以便更好地支持 ML 开发人员编写无差错的 ML 代码。例如，未来的研究可以利用堆栈跟踪的常见模式来帮助 ML 开发人员找到与他们的问题类似的解决方案，或者找出在解决类似问题模式方面有经验的专家。研究人员和 ML 库开发人员可以优先帮助 ML 开发人员识别 ML 应用程序接口的误用、数据格式的不匹配以及潜在的数据/资源争议，以便 ML 开发人员可以更好地避免/修复模型相关异常模式、数据相关异常模式以及多进程相关异常模式。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Empirical Software Engineering 工程技术-计算机：软件工程

CiteScore

8.50

自引率

12.20%

发文量

169

审稿时长

>12 weeks

期刊介绍： Empirical Software Engineering provides a forum for applied software engineering research with a strong empirical component, and a venue for publishing empirical results relevant to both researchers and practitioners. Empirical studies presented here usually involve the collection and analysis of data and experience that can be used to characterize, evaluate and reveal relationships between software development deliverables, practices, and technologies. Over time, it is expected that such empirical results will form a body of knowledge leading to widely accepted and well-formed theories. The journal also offers industrial experience reports detailing the application of software technologies - processes, methods, or tools - and their effectiveness in industrial settings. Empirical Software Engineering promotes the publication of industry-relevant research, to address the significant gap between research and practice.