Self-Claimed Assumptions in Deep Learning Frameworks: An Exploratory Study

Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering Pub Date : 2021-04-29 DOI:10.1145/3463274.3463333

Chen Yang, Peng Liang, Liming Fu, Zengyang Li

{"title":"Self-Claimed Assumptions in Deep Learning Frameworks: An Exploratory Study","authors":"Chen Yang, Peng Liang, Liming Fu, Zengyang Li","doi":"10.1145/3463274.3463333","DOIUrl":null,"url":null,"abstract":"Deep learning (DL) frameworks have been extensively designed, implemented, and used in software projects across many domains. However, due to the lack of knowledge or information, time pressure, complex context, etc., various uncertainties emerge during the development, leading to assumptions made in DL frameworks. Though not all the assumptions are negative to the frameworks, being unaware of certain assumptions can result in critical problems (e.g., system vulnerability and failures). As the first step of addressing the critical problems, there is a need to explore and understand the assumptions made in DL frameworks. To this end, we conducted an exploratory study to understand self-claimed assumptions (SCAs) about their distribution, classification, and impacts using code comments from nine popular DL framework projects on GitHub. The results are that: (1) 3,084 SCAs are scattered across 1,775 files in the nine DL frameworks, ranging from 1,460 (TensorFlow) to 8 (Keras) SCAs. (2) There are four types of validity of SCAs: Valid SCA, Invalid SCA, Conditional SCA, and Unknown SCA, and four types of SCAs based on their content: Configuration and Context SCA, Design SCA, Tensor and Variable SCA, and Miscellaneous SCA. (3) Both valid and invalid SCAs may have an impact within a specific scope (e.g., in a function) on the DL frameworks. Certain technical debt is induced when making SCAs. There are source code written and decisions made based on SCAs. This is the first study on investigating SCAs in DL frameworks, which helps researchers and practitioners to get a comprehensive understanding on the assumptions made. We also provide the first dataset of SCAs for further research and practice in this area.","PeriodicalId":328024,"journal":{"name":"Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering","volume":"151 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3463274.3463333","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Deep learning (DL) frameworks have been extensively designed, implemented, and used in software projects across many domains. However, due to the lack of knowledge or information, time pressure, complex context, etc., various uncertainties emerge during the development, leading to assumptions made in DL frameworks. Though not all the assumptions are negative to the frameworks, being unaware of certain assumptions can result in critical problems (e.g., system vulnerability and failures). As the first step of addressing the critical problems, there is a need to explore and understand the assumptions made in DL frameworks. To this end, we conducted an exploratory study to understand self-claimed assumptions (SCAs) about their distribution, classification, and impacts using code comments from nine popular DL framework projects on GitHub. The results are that: (1) 3,084 SCAs are scattered across 1,775 files in the nine DL frameworks, ranging from 1,460 (TensorFlow) to 8 (Keras) SCAs. (2) There are four types of validity of SCAs: Valid SCA, Invalid SCA, Conditional SCA, and Unknown SCA, and four types of SCAs based on their content: Configuration and Context SCA, Design SCA, Tensor and Variable SCA, and Miscellaneous SCA. (3) Both valid and invalid SCAs may have an impact within a specific scope (e.g., in a function) on the DL frameworks. Certain technical debt is induced when making SCAs. There are source code written and decisions made based on SCAs. This is the first study on investigating SCAs in DL frameworks, which helps researchers and practitioners to get a comprehensive understanding on the assumptions made. We also provide the first dataset of SCAs for further research and practice in this area.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

深度学习框架中的自我假设:一项探索性研究

深度学习(DL)框架已经在许多领域的软件项目中得到了广泛的设计、实现和使用。然而，由于缺乏知识或信息，时间压力，复杂的环境等，在开发过程中出现了各种不确定性，导致DL框架中的假设。尽管并非所有的假设都对框架不利，但不知道某些假设可能会导致严重的问题(例如，系统脆弱性和故障)。作为解决关键问题的第一步，有必要探索和理解深度学习框架中的假设。为此，我们进行了一项探索性研究，使用GitHub上九个流行的深度学习框架项目的代码注释来理解关于其分布、分类和影响的自我宣称假设(sca)。结果是:(1)3,084个sca分散在九个深度学习框架中的1,775个文件中，范围从1,460 (TensorFlow)到8 (Keras) sca。(2) SCA的有效性有四种类型:有效SCA、无效SCA、条件SCA和未知SCA，根据SCA的内容分为四种类型:配置和上下文SCA、设计SCA、张量和变量SCA以及杂项SCA。(3)有效和无效的sca都可能在特定范围内(例如，在一个函数中)对DL框架产生影响。在制作sca时，会产生某些技术债务。有基于sca编写的源代码和做出的决策。这是关于在深度学习框架中调查sca的第一项研究，它有助于研究人员和实践者对所做的假设有一个全面的理解。我们还为该领域的进一步研究和实践提供了第一个sca数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering

自引率

0.00%

发文量

期刊最新文献

About the Assessment of Grey Literature in Software Engineering Towards an Automated Classification Approach for Software Engineering Research Fog Based Energy Efficient Process Framework for Smart Building Open Data-driven Usability Improvements of Static Code Analysis and its Challenges Towards a corpus for credibility assessment in software practitioner blog articles