Exploring the Architectural Impact of Possible Dependencies in Python Software

2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE) Pub Date : 2020-09-01 DOI:10.1145/3324884.3416619

Wuxia Jin, Yuanfang Cai, R. Kazman, Gang Zhang, Q. Zheng, Ting Liu

{"title":"Exploring the Architectural Impact of Possible Dependencies in Python Software","authors":"Wuxia Jin, Yuanfang Cai, R. Kazman, Gang Zhang, Q. Zheng, Ting Liu","doi":"10.1145/3324884.3416619","DOIUrl":null,"url":null,"abstract":"Dependencies among software entities are the basis for many software analytic research and architecture analysis tools. Dynamically typed languages, such as Python, JavaScript and Ruby, tolerate the lack of explicit type references, making certain syntactic dependencies indiscernible in source code. We call these possible dependencies, in contrast with the explicit dependencies that are directly referenced in source code. Type inference techniques have been widely studied and applied, but existing architecture analytic research and tools have not taken possible dependencies into consideration. The fundamental question is, to what extent will these missing possible dependencies impact the architecture analysis? To answer this question, we conducted an empirical study with 105 Python projects, using type inference techniques to manifest possible dependencies. Our study revealed that the architectural impact of possible dependencies is substantial-higher than that of explicit dependencies: (1) file-level possible dependencies account for at least 27.93% of all file-level dependencies, and create different dependency structures than that of explicit dependencies only, with an average difference of 30.71%; (2) adding possible dependencies significantly improves the precision (0.52%~14.18%), recall(31.73%~39.12%), and F1 scores (22.13%~32.09%) of capturing co-change relations; (3) on average, a file involved in possible dependencies influences 28% more files and 42% more dependencies within architectural sub-spaces than a file involved in just explicit dependencies; (4) on average, a file involved in possible dependencies consumes 32% more maintenance effort. Consequently, maintainability scores reported by existing tools make a system written in these dynamic languages appear to be better modularized than it actually is. This evidence stronglysuggests that possible dependencies have a more significant impact than explicit dependencies on architecture quality, that architecture analysis and tools should assess and even emphasize the architectural impact of possible dependencies due to dynamic typing.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3324884.3416619","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Dependencies among software entities are the basis for many software analytic research and architecture analysis tools. Dynamically typed languages, such as Python, JavaScript and Ruby, tolerate the lack of explicit type references, making certain syntactic dependencies indiscernible in source code. We call these possible dependencies, in contrast with the explicit dependencies that are directly referenced in source code. Type inference techniques have been widely studied and applied, but existing architecture analytic research and tools have not taken possible dependencies into consideration. The fundamental question is, to what extent will these missing possible dependencies impact the architecture analysis? To answer this question, we conducted an empirical study with 105 Python projects, using type inference techniques to manifest possible dependencies. Our study revealed that the architectural impact of possible dependencies is substantial-higher than that of explicit dependencies: (1) file-level possible dependencies account for at least 27.93% of all file-level dependencies, and create different dependency structures than that of explicit dependencies only, with an average difference of 30.71%; (2) adding possible dependencies significantly improves the precision (0.52%~14.18%), recall(31.73%~39.12%), and F1 scores (22.13%~32.09%) of capturing co-change relations; (3) on average, a file involved in possible dependencies influences 28% more files and 42% more dependencies within architectural sub-spaces than a file involved in just explicit dependencies; (4) on average, a file involved in possible dependencies consumes 32% more maintenance effort. Consequently, maintainability scores reported by existing tools make a system written in these dynamic languages appear to be better modularized than it actually is. This evidence stronglysuggests that possible dependencies have a more significant impact than explicit dependencies on architecture quality, that architecture analysis and tools should assess and even emphasize the architectural impact of possible dependencies due to dynamic typing.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

探索Python软件中可能的依赖对体系结构的影响

软件实体之间的依赖关系是许多软件分析研究和架构分析工具的基础。动态类型语言，如Python、JavaScript和Ruby，容忍缺乏显式类型引用，使某些语法依赖在源代码中无法识别。我们称这些为可能的依赖，与直接在源代码中引用的显式依赖形成对比。类型推断技术已经得到了广泛的研究和应用，但是现有的架构分析研究和工具并没有考虑到可能的依赖性。最基本的问题是，这些缺失的可能的依赖关系会在多大程度上影响架构分析?为了回答这个问题，我们对105个Python项目进行了实证研究，使用类型推断技术来显示可能的依赖关系。我们的研究表明，可能的依赖关系对架构的影响比显式依赖关系大得多:①文件级可能的依赖关系至少占所有文件级依赖关系的27.93%，并且与仅显式依赖关系创建的依赖结构不同，平均差异为30.71%;(2)增加可能依赖关系显著提高了共变关系捕获的准确率(0.52%~14.18%)、召回率(31.73%~39.12%)和F1得分(22.13%~32.09%);(3)平均而言，在架构子空间中，涉及可能依赖项的文件比仅涉及显式依赖项的文件多影响28%的文件和42%的依赖项;(4)平均而言，涉及到可能依赖项的文件要多消耗32%的维护工作。因此，由现有工具报告的可维护性分数使得用这些动态语言编写的系统看起来比实际上更好地模块化了。这一证据有力地表明，可能的依赖关系比显式依赖关系对体系结构质量的影响更大，体系结构分析和工具应该评估甚至强调由于动态类型导致的可能的依赖关系对体系结构的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助