A Taxonomy of Self-Admitted Technical Debt in Deep Learning Systems

arXiv - CS - Software Engineering Pub Date : 2024-09-18 DOI:arxiv-2409.11826

Federica Pepe, Fiorella Zampetti, Antonio Mastropaolo, Gabriele Bavota, Massimiliano Di Penta

{"title":"A Taxonomy of Self-Admitted Technical Debt in Deep Learning Systems","authors":"Federica Pepe, Fiorella Zampetti, Antonio Mastropaolo, Gabriele Bavota, Massimiliano Di Penta","doi":"arxiv-2409.11826","DOIUrl":null,"url":null,"abstract":"The development of Machine Learning (ML)- and, more recently, of Deep\nLearning (DL)-intensive systems requires suitable choices, e.g., in terms of\ntechnology, algorithms, and hyper-parameters. Such choices depend on\ndevelopers' experience, as well as on proper experimentation. Due to limited\ntime availability, developers may adopt suboptimal, sometimes temporary\nchoices, leading to a technical debt (TD) specifically related to the ML code.\nThis paper empirically analyzes the presence of Self-Admitted Technical Debt\n(SATD) in DL systems. After selecting 100 open-source Python projects using\npopular DL frameworks, we identified SATD from their source comments and\ncreated a stratified sample of 443 SATD to analyze manually. We derived a\ntaxonomy of DL-specific SATD through open coding, featuring seven categories\nand 41 leaves. The identified SATD categories pertain to different aspects of\nDL models, some of which are technological (e.g., due to hardware or libraries)\nand some related to suboptimal choices in the DL process, model usage, or\nconfiguration. Our findings indicate that DL-specific SATD differs from DL bugs\nfound in previous studies, as it typically pertains to suboptimal solutions\nrather than functional (\\eg blocking) problems. Last but not least, we found\nthat state-of-the-art static analysis tools do not help developers avoid such\nproblems, and therefore, specific support is needed to cope with DL-specific\nSATD.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"76 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11826","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The development of Machine Learning (ML)- and, more recently, of Deep Learning (DL)-intensive systems requires suitable choices, e.g., in terms of technology, algorithms, and hyper-parameters. Such choices depend on developers' experience, as well as on proper experimentation. Due to limited time availability, developers may adopt suboptimal, sometimes temporary choices, leading to a technical debt (TD) specifically related to the ML code. This paper empirically analyzes the presence of Self-Admitted Technical Debt (SATD) in DL systems. After selecting 100 open-source Python projects using popular DL frameworks, we identified SATD from their source comments and created a stratified sample of 443 SATD to analyze manually. We derived a taxonomy of DL-specific SATD through open coding, featuring seven categories and 41 leaves. The identified SATD categories pertain to different aspects of DL models, some of which are technological (e.g., due to hardware or libraries) and some related to suboptimal choices in the DL process, model usage, or configuration. Our findings indicate that DL-specific SATD differs from DL bugs found in previous studies, as it typically pertains to suboptimal solutions rather than functional (\eg blocking) problems. Last but not least, we found that state-of-the-art static analysis tools do not help developers avoid such problems, and therefore, specific support is needed to cope with DL-specific SATD.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

深度学习系统中自我承认的技术债务分类标准

开发机器学习（ML）--以及最近的深度学习（DL）--密集型系统，需要在技术、算法和超参数等方面做出适当的选择。这种选择取决于开发人员的经验以及适当的实验。由于时间有限，开发人员可能会采用次优的、有时是临时性的选择，从而导致与 ML 代码相关的技术债务（TD）。在选择了 100 个使用流行 DL 框架的开源 Python 项目后，我们从其源代码注释中识别出了 SATD，并创建了一个包含 443 个 SATD 的分层样本进行人工分析。通过开放式编码，我们得出了针对 DL 的 SATD 分类法，其中包括 7 个类别和 41 个叶子。所确定的 SATD 类别涉及 DL 模型的不同方面，其中一些是技术方面的（例如，由于硬件或库），另一些则与 DL 过程中的次优选择、模型使用或配置有关。我们的研究结果表明，针对 DL 的 SATD 不同于以往研究中发现的 DL 错误，因为它通常与次优解决方案有关，而不是功能性（阻塞）问题。最后但并非最不重要的一点是，我们发现最先进的静态分析工具无法帮助开发人员避免此类问题，因此需要特定的支持来应对 DL-specificSATD。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Software Engineering

自引率

0.00%

发文量