Federica Pepe, Fiorella Zampetti, Antonio Mastropaolo, Gabriele Bavota, Massimiliano Di Penta
{"title":"A Taxonomy of Self-Admitted Technical Debt in Deep Learning Systems","authors":"Federica Pepe, Fiorella Zampetti, Antonio Mastropaolo, Gabriele Bavota, Massimiliano Di Penta","doi":"arxiv-2409.11826","DOIUrl":null,"url":null,"abstract":"The development of Machine Learning (ML)- and, more recently, of Deep\nLearning (DL)-intensive systems requires suitable choices, e.g., in terms of\ntechnology, algorithms, and hyper-parameters. Such choices depend on\ndevelopers' experience, as well as on proper experimentation. Due to limited\ntime availability, developers may adopt suboptimal, sometimes temporary\nchoices, leading to a technical debt (TD) specifically related to the ML code.\nThis paper empirically analyzes the presence of Self-Admitted Technical Debt\n(SATD) in DL systems. After selecting 100 open-source Python projects using\npopular DL frameworks, we identified SATD from their source comments and\ncreated a stratified sample of 443 SATD to analyze manually. We derived a\ntaxonomy of DL-specific SATD through open coding, featuring seven categories\nand 41 leaves. The identified SATD categories pertain to different aspects of\nDL models, some of which are technological (e.g., due to hardware or libraries)\nand some related to suboptimal choices in the DL process, model usage, or\nconfiguration. Our findings indicate that DL-specific SATD differs from DL bugs\nfound in previous studies, as it typically pertains to suboptimal solutions\nrather than functional (\\eg blocking) problems. Last but not least, we found\nthat state-of-the-art static analysis tools do not help developers avoid such\nproblems, and therefore, specific support is needed to cope with DL-specific\nSATD.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11826","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The development of Machine Learning (ML)- and, more recently, of Deep
Learning (DL)-intensive systems requires suitable choices, e.g., in terms of
technology, algorithms, and hyper-parameters. Such choices depend on
developers' experience, as well as on proper experimentation. Due to limited
time availability, developers may adopt suboptimal, sometimes temporary
choices, leading to a technical debt (TD) specifically related to the ML code.
This paper empirically analyzes the presence of Self-Admitted Technical Debt
(SATD) in DL systems. After selecting 100 open-source Python projects using
popular DL frameworks, we identified SATD from their source comments and
created a stratified sample of 443 SATD to analyze manually. We derived a
taxonomy of DL-specific SATD through open coding, featuring seven categories
and 41 leaves. The identified SATD categories pertain to different aspects of
DL models, some of which are technological (e.g., due to hardware or libraries)
and some related to suboptimal choices in the DL process, model usage, or
configuration. Our findings indicate that DL-specific SATD differs from DL bugs
found in previous studies, as it typically pertains to suboptimal solutions
rather than functional (\eg blocking) problems. Last but not least, we found
that state-of-the-art static analysis tools do not help developers avoid such
problems, and therefore, specific support is needed to cope with DL-specific
SATD.