2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)最新文献_第4页

Don’t Do That! Hunting Down Visual Design Smells in Complex UIs Against Design Guidelines 不要那样做!根据设计准则在复杂的ui中寻找视觉设计气味

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00075

Bo Yang, Zhenchang Xing, Xin Xia, Chunyang Chen, Deheng Ye, Shanping Li

Just like code smells in source code, UI design has visual design smells. We study 93 don't-do-that guidelines in the Material Design, a complex design system created by Google. We find that these don't-guidelines go far beyond UI aesthetics, and involve seven general design dimensions (layout, typography, iconography, navigation, communication, color, and shape) and four component design aspects (anatomy, placement, behavior, and usage). Violating these guidelines results in visual design smells in UIs (or UI design smells). In a study of 60,756 UIs of 9,286 Android apps, we find that 7,497 UIs of 2,587 apps have at least one violation of some Material Design guidelines. This reveals the lack of developer training and tool support to avoid UI design smells. To fill this gap, we design an automated UI design smell detector (UIS-Hunter) that extracts and validates multi-modal UI information (component metadata, typography, iconography, color, and edge) for detecting the violation of diverse don't-guidelines in Material Design. The detection accuracy of UIS-Hunter is high (precision=0.81, recall=0.90) on the 60,756 UIs of 9,286 apps. We build a guideline gallery with real-world UI design smells that UIS-Hunter detects for developers to learn the best Material Design practices. Our user studies show that UIS-Hunter is more effective than manual detection of UI design smells, and the UI design smells that are detected by UIS-Hunter have severely negative impacts on app users.

就像源代码中的代码气味一样，UI设计也有视觉设计气味。我们在Google创建的复杂设计系统Material Design中研究了93条“不要那样做”的指导原则。我们发现这些禁忌准则远远超出了UI美学的范畴，涉及到7个一般设计维度(布局、排版、图像、导航、沟通、颜色和形状)和4个组件设计方面(结构、位置、行为和使用)。违反这些准则会导致UI出现视觉设计气味(或UI设计气味)。在对9286个Android应用的60756个ui的研究中，我们发现2587个应用的7497个ui至少有一个违反了某些材料设计准则。这表明缺乏开发人员培训和工具支持来避免UI设计异味。为了填补这一空白，我们设计了一个自动的UI设计气味检测器(us - hunter)，它提取并验证多模态UI信息(组件元数据，排版，图像，颜色和边缘)，用于检测违反材料设计中各种不允许的指导方针。在9286个app的60756个ui上，us - hunter的检测准确率很高(precision=0.81, recall=0.90)。我们建立了一个指南画廊与现实世界的UI设计气味，美国猎人检测开发人员学习最佳的材料设计实践。我们的用户研究表明，usis - hunter比人工检测UI设计气味更有效，并且usis - hunter检测到的UI设计气味对应用程序用户有严重的负面影响。

{"title":"Don’t Do That! Hunting Down Visual Design Smells in Complex UIs Against Design Guidelines","authors":"Bo Yang, Zhenchang Xing, Xin Xia, Chunyang Chen, Deheng Ye, Shanping Li","doi":"10.1109/ICSE43902.2021.00075","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00075","url":null,"abstract":"Just like code smells in source code, UI design has visual design smells. We study 93 don't-do-that guidelines in the Material Design, a complex design system created by Google. We find that these don't-guidelines go far beyond UI aesthetics, and involve seven general design dimensions (layout, typography, iconography, navigation, communication, color, and shape) and four component design aspects (anatomy, placement, behavior, and usage). Violating these guidelines results in visual design smells in UIs (or UI design smells). In a study of 60,756 UIs of 9,286 Android apps, we find that 7,497 UIs of 2,587 apps have at least one violation of some Material Design guidelines. This reveals the lack of developer training and tool support to avoid UI design smells. To fill this gap, we design an automated UI design smell detector (UIS-Hunter) that extracts and validates multi-modal UI information (component metadata, typography, iconography, color, and edge) for detecting the violation of diverse don't-guidelines in Material Design. The detection accuracy of UIS-Hunter is high (precision=0.81, recall=0.90) on the 60,756 UIs of 9,286 apps. We build a guideline gallery with real-world UI design smells that UIS-Hunter detects for developers to learn the best Material Design practices. Our user studies show that UIS-Hunter is more effective than manual detection of UI design smells, and the UI design smells that are detected by UIS-Hunter have severely negative impacts on app users.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115314841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

An Empirical Study of Refactorings and Technical Debt in Machine Learning Systems 机器学习系统中重构和技术债务的实证研究

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00033

Yiming Tang, Raffi Khatchadourian, M. Bagherzadeh, Rhia Singh, Ajani Stewart, A. Raja

Machine Learning (ML), including Deep Learning (DL), systems, i.e., those with ML capabilities, are pervasive in today's data-driven society. Such systems are complex; they are comprised of ML models and many subsystems that support learning processes. As with other complex systems, ML systems are prone to classic technical debt issues, especially when such systems are long-lived, but they also exhibit debt specific to these systems. Unfortunately, there is a gap of knowledge in how ML systems actually evolve and are maintained. In this paper, we fill this gap by studying refactorings, i.e., source-to-source semantics-preserving program transformations, performed in real-world, open-source software, and the technical debt issues they alleviate. We analyzed 26 projects, consisting of 4.2 MLOC, along with 327 manually examined code patches. The results indicate that developers refactor these systems for a variety of reasons, both specific and tangential to ML, some refactorings correspond to established technical debt categories, while others do not, and code duplication is a major cross-cutting theme that particularly involved ML configuration and model code, which was also the most refactored. We also introduce 14 and 7 new ML-specific refactorings and technical debt categories, respectively, and put forth several recommendations, best practices, and anti-patterns. The results can potentially assist practitioners, tool developers, and educators in facilitating long-term ML system usefulness.

机器学习(ML)，包括深度学习(DL)，系统，即具有ML功能的系统，在当今数据驱动的社会中无处不在。这样的系统很复杂;它们由ML模型和许多支持学习过程的子系统组成。与其他复杂系统一样，ML系统容易出现经典的技术债务问题，特别是当此类系统长期存在时，但它们也表现出特定于这些系统的债务。不幸的是，在机器学习系统的实际发展和维护方面存在知识差距。在本文中，我们通过研究重构来填补这一空白，例如，在现实世界中，开源软件中执行的源代码到源代码语义保留程序转换，以及它们所缓解的技术债务问题。我们分析了26个项目，包括4.2个MLOC，以及327个手动检查的代码补丁。结果表明，开发人员出于各种原因重构这些系统，既有特定的原因，也有与ML无关的原因，一些重构与已建立的技术债务类别相对应，而另一些则没有，代码复制是一个主要的横切主题，特别是涉及ML配置和模型代码，这也是重构最多的。我们还分别介绍了14个和7个新的特定于ml的重构和技术债务类别，并提出了一些建议、最佳实践和反模式。结果可以潜在地帮助从业者、工具开发人员和教育者促进长期ML系统的有用性。

{"title":"An Empirical Study of Refactorings and Technical Debt in Machine Learning Systems","authors":"Yiming Tang, Raffi Khatchadourian, M. Bagherzadeh, Rhia Singh, Ajani Stewart, A. Raja","doi":"10.1109/ICSE43902.2021.00033","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00033","url":null,"abstract":"Machine Learning (ML), including Deep Learning (DL), systems, i.e., those with ML capabilities, are pervasive in today's data-driven society. Such systems are complex; they are comprised of ML models and many subsystems that support learning processes. As with other complex systems, ML systems are prone to classic technical debt issues, especially when such systems are long-lived, but they also exhibit debt specific to these systems. Unfortunately, there is a gap of knowledge in how ML systems actually evolve and are maintained. In this paper, we fill this gap by studying refactorings, i.e., source-to-source semantics-preserving program transformations, performed in real-world, open-source software, and the technical debt issues they alleviate. We analyzed 26 projects, consisting of 4.2 MLOC, along with 327 manually examined code patches. The results indicate that developers refactor these systems for a variety of reasons, both specific and tangential to ML, some refactorings correspond to established technical debt categories, while others do not, and code duplication is a major cross-cutting theme that particularly involved ML configuration and model code, which was also the most refactored. We also introduce 14 and 7 new ML-specific refactorings and technical debt categories, respectively, and put forth several recommendations, best practices, and anti-patterns. The results can potentially assist practitioners, tool developers, and educators in facilitating long-term ML system usefulness.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129527493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

What Makes a Great Maintainer of Open Source Projects? 怎样才能成为开源项目的优秀维护者?

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00093

E. Dias, Paulo Meirelles, F. C. Filho, Igor Steinmacher, I. Wiese, G. Pinto

Although Open Source Software (OSS) maintainers devote a significant proportion of their work to coding tasks, great maintainers must excel in many other activities beyond coding. Maintainers should care about fostering a community, helping new members to find their place, while also saying "no" to patches that although are well-coded and well-tested, do not contribute to the goal of the project. To perform all these activities masterfully, maintainers should exercise attributes that software engineers (working on closed source projects) do not always need to master. This paper aims to uncover, relate, and prioritize the unique attributes that great OSS maintainers might have. To achieve this goal, we conducted 33 semi-structured interviews with well-experienced maintainers that are the gatekeepers of notable projects such as the Linux Kernel, the Debian operating system, and the GitLab coding platform. After we analyzed the interviews and curated a list of attributes, we created a conceptual framework to explain how these attributes are connected. We then conducted a rating survey with 90 OSS contributors. We noted that "technical excellence" and "communication" are the most recurring attributes. When grouped, these attributes fit into four broad categories: management, social, technical, and personality. While we noted that "sustain a long term vision of the project" and being "extremely careful" seem to form the basis of our framework, we noted through our survey that the communication attribute was perceived as the most essential one.

尽管开源软件(OSS)维护者将他们工作的很大一部分用于编码任务，但是优秀的维护者必须在编码之外的许多其他活动中表现出色。维护者应该关心培养一个社区，帮助新成员找到他们的位置，同时也要对那些虽然编码良好、测试良好，但对项目目标没有贡献的补丁说“不”。为了熟练地执行所有这些活动，维护人员应该练习软件工程师(从事闭源项目)并不总是需要掌握的属性。本文旨在揭示、关联并优先考虑伟大的OSS维护者可能拥有的独特属性。为了实现这一目标，我们与经验丰富的维护者进行了33次半结构化访谈，这些维护者是著名项目(如Linux Kernel、Debian操作系统和GitLab编码平台)的看门人。在我们分析了访谈并整理了一系列属性之后，我们创建了一个概念性框架来解释这些属性是如何联系在一起的。然后，我们对90个OSS贡献者进行了评级调查。我们注意到“技术卓越”和“沟通”是最常出现的属性。当分组时，这些属性可以分为四大类:管理、社会、技术和个性。虽然我们注意到“保持项目的长期远景”和“极其谨慎”似乎构成了我们框架的基础，但我们注意到，通过我们的调查，沟通属性被认为是最重要的。

{"title":"What Makes a Great Maintainer of Open Source Projects?","authors":"E. Dias, Paulo Meirelles, F. C. Filho, Igor Steinmacher, I. Wiese, G. Pinto","doi":"10.1109/ICSE43902.2021.00093","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00093","url":null,"abstract":"Although Open Source Software (OSS) maintainers devote a significant proportion of their work to coding tasks, great maintainers must excel in many other activities beyond coding. Maintainers should care about fostering a community, helping new members to find their place, while also saying \"no\" to patches that although are well-coded and well-tested, do not contribute to the goal of the project. To perform all these activities masterfully, maintainers should exercise attributes that software engineers (working on closed source projects) do not always need to master. This paper aims to uncover, relate, and prioritize the unique attributes that great OSS maintainers might have. To achieve this goal, we conducted 33 semi-structured interviews with well-experienced maintainers that are the gatekeepers of notable projects such as the Linux Kernel, the Debian operating system, and the GitLab coding platform. After we analyzed the interviews and curated a list of attributes, we created a conceptual framework to explain how these attributes are connected. We then conducted a rating survey with 90 OSS contributors. We noted that \"technical excellence\" and \"communication\" are the most recurring attributes. When grouped, these attributes fit into four broad categories: management, social, technical, and personality. While we noted that \"sustain a long term vision of the project\" and being \"extremely careful\" seem to form the basis of our framework, we noted through our survey that the communication attribute was perceived as the most essential one.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114760125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Resource-Guided Configuration Space Reduction for Deep Learning Models 深度学习模型的资源导向配置空间约简

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00028

Yanjie Gao, Yonghao Zhu, Hongyu Zhang, Haoxiang Lin, Mao Yang

Deep learning models, like traditional software systems, provide a large number of configuration options. A deep learning model can be configured with different hyperparameters and neural architectures. Recently, AutoML (Automated Machine Learning) has been widely adopted to automate model training by systematically exploring diverse configurations. However, current AutoML approaches do not take into consideration the computational constraints imposed by various resources such as available memory, computing power of devices, or execution time. The training with non-conforming configurations could lead to many failed AutoML trial jobs or inappropriate models, which cause significant resource waste and severely slow down development productivity. In this paper, we propose DnnSAT, a resource-guided AutoML approach for deep learning models to help existing AutoML tools efficiently reduce the configuration space ahead of time. DnnSAT can speed up the search process and achieve equal or even better model learning performance because it excludes trial jobs not satisfying the constraints and saves resources for more trials. We formulate the resource-guided configuration space reduction as a constraint satisfaction problem. DnnSAT includes a unified analytic cost model to construct common constraints with respect to the model weight size, number of floating-point operations, model inference time, and GPU memory consumption. It then utilizes an SMT solver to obtain the satisfiable configurations of hyperparameters and neural architectures. Our evaluation results demonstrate the effectiveness of DnnSAT in accelerating state-of-the-art AutoML methods (Hyperparameter Optimization and Neural Architecture Search) with an average speedup from 1.19X to 3.95X on public benchmarks. We believe that DnnSAT can make AutoML more practical in a real-world environment with constrained resources.

与传统的软件系统一样，深度学习模型提供了大量的配置选项。深度学习模型可以配置不同的超参数和神经结构。最近，AutoML(自动化机器学习)被广泛采用，通过系统地探索不同的配置来实现模型训练的自动化。然而，当前的AutoML方法没有考虑到各种资源(如可用内存、设备的计算能力或执行时间)所施加的计算约束。不符合配置的培训可能导致许多失败的AutoML试验工作或不合适的模型，这将导致严重的资源浪费并严重降低开发效率。在本文中，我们提出了DnnSAT，一种用于深度学习模型的资源导向AutoML方法，以帮助现有的AutoML工具有效地提前减少配置空间。DnnSAT排除了不满足约束条件的试验作业，节省了更多的试验资源，可以加快搜索过程，达到相同甚至更好的模型学习性能。我们将资源导向的配置空间约简表述为约束满足问题。DnnSAT包括一个统一的分析成本模型，用于构建关于模型权重大小、浮点运算次数、模型推理时间和GPU内存消耗的公共约束。然后利用SMT求解器获得超参数和神经结构的满意配置。我们的评估结果证明了DnnSAT在加速最先进的AutoML方法(超参数优化和神经架构搜索)方面的有效性，在公共基准测试中平均加速从1.19倍提高到3.95倍。我们相信DnnSAT可以使AutoML在资源受限的现实环境中更加实用。

{"title":"Resource-Guided Configuration Space Reduction for Deep Learning Models","authors":"Yanjie Gao, Yonghao Zhu, Hongyu Zhang, Haoxiang Lin, Mao Yang","doi":"10.1109/ICSE43902.2021.00028","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00028","url":null,"abstract":"Deep learning models, like traditional software systems, provide a large number of configuration options. A deep learning model can be configured with different hyperparameters and neural architectures. Recently, AutoML (Automated Machine Learning) has been widely adopted to automate model training by systematically exploring diverse configurations. However, current AutoML approaches do not take into consideration the computational constraints imposed by various resources such as available memory, computing power of devices, or execution time. The training with non-conforming configurations could lead to many failed AutoML trial jobs or inappropriate models, which cause significant resource waste and severely slow down development productivity. In this paper, we propose DnnSAT, a resource-guided AutoML approach for deep learning models to help existing AutoML tools efficiently reduce the configuration space ahead of time. DnnSAT can speed up the search process and achieve equal or even better model learning performance because it excludes trial jobs not satisfying the constraints and saves resources for more trials. We formulate the resource-guided configuration space reduction as a constraint satisfaction problem. DnnSAT includes a unified analytic cost model to construct common constraints with respect to the model weight size, number of floating-point operations, model inference time, and GPU memory consumption. It then utilizes an SMT solver to obtain the satisfiable configurations of hyperparameters and neural architectures. Our evaluation results demonstrate the effectiveness of DnnSAT in accelerating state-of-the-art AutoML methods (Hyperparameter Optimization and Neural Architecture Search) with an average speedup from 1.19X to 3.95X on public benchmarks. We believe that DnnSAT can make AutoML more practical in a real-world environment with constrained resources.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131847543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

AID: An Automated Detector for Gender-Inclusivity Bugs in OSS Project Pages AID: OSS项目页面中性别包容性错误的自动检测器

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00128

Amreeta Chatterjee, M. Guizani, Catherine Stevens, Jillian Emard, Mary Evelyn May, M. Burnett, Iftekhar Ahmed, A. Sarma

The tools and infrastructure used in tech, including Open Source Software (OSS), can embed "inclusivity bugs"- features that disproportionately disadvantage particular groups of contributors. To see whether OSS developers have existing practices to ward off such bugs, we surveyed 266 OSS developers. Our results show that a majority (77%) of developers do not use any inclusivity practices, and 92% of respondents cited a lack of concrete resources to enable them to do so. To help fill this gap, this paper introduces AID, a tool that automates the GenderMag method to systematically find gender-inclusivity bugs in software. We then present the results of the tool's evaluation on 20 GitHub projects. The tool achieved precision of 0.69, recall of 0.92, an F-measure of 0.79 and even captured some inclusivity bugs that human GenderMag teams missed.

科技领域使用的工具和基础设施，包括开源软件(OSS)，都可能嵌入“包容性缺陷”——这些特征会对特定的贡献者群体造成不成比例的不利影响。为了了解OSS开发人员是否有现有的实践来避免此类错误，我们调查了266名OSS开发人员。我们的结果显示，大多数(77%)开发者没有使用任何包容性实践，92%的受访者表示缺乏具体的资源来帮助他们这样做。为了帮助填补这一空白，本文介绍了AID，这是一个自动化GenderMag方法的工具，可以系统地发现软件中的性别包容性错误。然后，我们展示了该工具对20个GitHub项目的评估结果。该工具的精确度为0.69，召回率为0.92,f值为0.79，甚至还捕获了一些人类《性别杂志》团队遗漏的包容性错误。

引用次数: 9

Program Comprehension and Code Complexity Metrics: An fMRI Study 程序理解和代码复杂性度量:一项功能磁共振成像研究

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00056

Norman Peitek, S. Apel, Chris Parnin, A. Brechmann, J. Siegmund

Background: Researchers and practitioners have been using code complexity metrics for decades to predict how developers comprehend a program. While it is plausible and tempting to use code metrics for this purpose, their validity is debated, since they rely on simple code properties and rarely consider particularities of human cognition. Aims: We investigate whether and how code complexity metrics reflect difficulty of program comprehension. Method: We have conducted a functional magnetic resonance imaging (fMRI) study with 19 participants observing program comprehension of short code snippets at varying complexity levels. We dissected four classes of code complexity metrics and their relationship to neuronal, behavioral, and subjective correlates of program comprehension, overall analyzing more than 41 metrics. Results: While our data corroborate that complexity metrics can-to a limited degree-explain programmers' cognition in program comprehension, fMRI allowed us to gain insights into why some code properties are difficult to process. In particular, a code's textual size drives programmers' attention, and vocabulary size burdens programmers' working memory. Conclusion: Our results provide neuro-scientific evidence supporting warnings of prior research questioning the validity of code complexity metrics and pin down factors relevant to program comprehension. Future Work: We outline several follow-up experiments investigating fine-grained effects of code complexity and describe possible refinements to code complexity metrics.

背景:几十年来，研究人员和实践者一直在使用代码复杂性度量来预测开发人员如何理解程序。虽然出于这个目的使用代码度量似乎是合理的和诱人的，但它们的有效性是有争议的，因为它们依赖于简单的代码属性，很少考虑人类认知的特殊性。目的:我们研究代码复杂度指标是否以及如何反映程序理解的难度。方法:我们对19名参与者进行了功能磁共振成像(fMRI)研究，观察了不同复杂程度的短代码片段的程序理解。我们剖析了四类代码复杂性度量及其与程序理解的神经元、行为和主观相关的关系，总体分析了超过41个度量。结果:虽然我们的数据证实，复杂性指标可以在一定程度上解释程序员在程序理解方面的认知，但fMRI使我们能够深入了解为什么一些代码属性难以处理。特别是，代码的文本大小会吸引程序员的注意力，而词汇量的大小会增加程序员工作记忆的负担。结论:我们的研究结果提供了神经科学证据，支持先前质疑代码复杂性度量有效性的研究警告，并确定了与程序理解相关的因素。未来工作:我们概述了几个后续实验，研究代码复杂性的细粒度效应，并描述了代码复杂性度量的可能改进。

{"title":"Program Comprehension and Code Complexity Metrics: An fMRI Study","authors":"Norman Peitek, S. Apel, Chris Parnin, A. Brechmann, J. Siegmund","doi":"10.1109/ICSE43902.2021.00056","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00056","url":null,"abstract":"Background: Researchers and practitioners have been using code complexity metrics for decades to predict how developers comprehend a program. While it is plausible and tempting to use code metrics for this purpose, their validity is debated, since they rely on simple code properties and rarely consider particularities of human cognition. Aims: We investigate whether and how code complexity metrics reflect difficulty of program comprehension. Method: We have conducted a functional magnetic resonance imaging (fMRI) study with 19 participants observing program comprehension of short code snippets at varying complexity levels. We dissected four classes of code complexity metrics and their relationship to neuronal, behavioral, and subjective correlates of program comprehension, overall analyzing more than 41 metrics. Results: While our data corroborate that complexity metrics can-to a limited degree-explain programmers' cognition in program comprehension, fMRI allowed us to gain insights into why some code properties are difficult to process. In particular, a code's textual size drives programmers' attention, and vocabulary size burdens programmers' working memory. Conclusion: Our results provide neuro-scientific evidence supporting warnings of prior research questioning the validity of code complexity metrics and pin down factors relevant to program comprehension. Future Work: We outline several follow-up experiments investigating fine-grained effects of code complexity and describe possible refinements to code complexity metrics.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116408590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

How Developers Optimize Virtual Reality Applications: A Study of Optimization Commits in Open Source Unity Projects 开发者如何优化虚拟现实应用:开源统一项目的优化提交研究

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00052

Fariha Nusrat, Foyzul Hassan, Hao Zhong, Xiaoyin Wang

Virtual Reality (VR) is an emerging technique that provides immersive experience for users. Due to the high computation cost of rendering real-time animation twice (for both eyes) and the resource limitation of wearable devices, VR applications often face performance bottlenecks and performanceoptimization plays an important role in VR software develop-ment. Performance optimizations of VR applications can be very different from those in traditional software as VR involves more elements such as graphics rendering and real-time animation. In this paper, we present the first empirical study on 183 real-world performance optimizations from 45 VR software projects. In particular, we manually categorized the optimizations in to 11 categories, and applied static analysis to identify how they affect different life-cycle phases of VR applications. Furthermore, we studied the complexity and design / behavior effects of performance optimizations, and how optimizations are different between large organizational software projects and smaller personal software projects. Our major findings include: (1) graphics simplification (24.0%), rendering optimization (16.9%), language / API optimization (15.3%), heap avoidance (14.8%), and valuecaching (12.0%) are the most common categories of performance optimization in VR applications; (2) game logic updates (30.4%) and before-scene initialization (20.0%) are the most common life-cycle phases affected by performance issues; (3) 45.9% of the optimizations have behavior and design effects and 39.3% of the optimizations are systematic changes; (4) the distributionsof optimization classes are very different between organizational VR projects and personal VR projects.

虚拟现实(VR)是一种为用户提供沉浸式体验的新兴技术。由于实时动画渲染两次(双眼)的高计算成本和可穿戴设备的资源限制，VR应用经常面临性能瓶颈，性能优化在VR软件开发中起着重要的作用。VR应用程序的性能优化可能与传统软件的性能优化有很大不同，因为VR涉及到更多的元素，如图形渲染和实时动画。在本文中，我们对来自45个VR软件项目的183个现实世界的性能优化进行了首次实证研究。特别是，我们手动将优化分为11个类别，并应用静态分析来确定它们如何影响VR应用程序的不同生命周期阶段。此外，我们还研究了性能优化的复杂性和设计/行为效应，以及大型组织软件项目和小型个人软件项目之间的优化差异。我们的主要发现包括:(1)图形简化(24.0%)、渲染优化(16.9%)、语言/ API优化(15.3%)、堆避免(14.8%)和值提取(12.0%)是VR应用中最常见的性能优化类别;(2)游戏逻辑更新(30.4%)和场景前初始化(20.0%)是受性能问题影响最普遍的生命周期阶段;(3) 45.9%的优化具有行为效应和设计效应，39.3%的优化具有系统性变化;(4)优化类在组织VR项目和个人VR项目中的分布差异较大。

{"title":"How Developers Optimize Virtual Reality Applications: A Study of Optimization Commits in Open Source Unity Projects","authors":"Fariha Nusrat, Foyzul Hassan, Hao Zhong, Xiaoyin Wang","doi":"10.1109/ICSE43902.2021.00052","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00052","url":null,"abstract":"Virtual Reality (VR) is an emerging technique that provides immersive experience for users. Due to the high computation cost of rendering real-time animation twice (for both eyes) and the resource limitation of wearable devices, VR applications often face performance bottlenecks and performanceoptimization plays an important role in VR software develop-ment. Performance optimizations of VR applications can be very different from those in traditional software as VR involves more elements such as graphics rendering and real-time animation. In this paper, we present the first empirical study on 183 real-world performance optimizations from 45 VR software projects. In particular, we manually categorized the optimizations in to 11 categories, and applied static analysis to identify how they affect different life-cycle phases of VR applications. Furthermore, we studied the complexity and design / behavior effects of performance optimizations, and how optimizations are different between large organizational software projects and smaller personal software projects. Our major findings include: (1) graphics simplification (24.0%), rendering optimization (16.9%), language / API optimization (15.3%), heap avoidance (14.8%), and valuecaching (12.0%) are the most common categories of performance optimization in VR applications; (2) game logic updates (30.4%) and before-scene initialization (20.0%) are the most common life-cycle phases affected by performance issues; (3) 45.9% of the optimizations have behavior and design effects and 39.3% of the optimizations are systematic changes; (4) the distributionsof optimization classes are very different between organizational VR projects and personal VR projects.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124500204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Prioritizing Test Inputs for Deep Neural Networks via Mutation Analysis 基于突变分析的深度神经网络测试输入的优先排序

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00046

Zan Wang, Hanmo You, Junjie Chen, Yingyi Zhang, Xuyuan Dong, Wenbin Zhang

Deep Neural Network (DNN) testing is one of the most widely-used ways to guarantee the quality of DNNs. However, labeling test inputs to check the correctness of DNN prediction is very costly, which could largely affect the efficiency of DNN testing, even the whole process of DNN development. To relieve the labeling-cost problem, we propose a novel test input prioritization approach (called PRIMA) for DNNs via intelligent mutation analysis in order to label more bug-revealing test inputs earlier for a limited time, which facilitates to improve the efficiency of DNN testing. PRIMA is based on the key insight: a test input that is able to kill many mutated models and produce different prediction results with many mutated inputs, is more likely to reveal DNN bugs, and thus it should be prioritized higher. After obtaining a number of mutation results from a series of our designed model and input mutation rules for each test input, PRIMA further incorporates learning-to-rank (a kind of supervised machine learning to solve ranking problems) to intelligently combine these mutation results for effective test input prioritization. We conducted an extensive study based on 36 popular subjects by carefully considering their diversity from five dimensions (i.e., different domains of test inputs, different DNN tasks, different network structures, different types of test inputs, and different training scenarios). Our experimental results demonstrate the effectiveness of PRIMA, significantly outperforming the state-of-the-art approaches (with the average improvement of 8.50%~131.01% in terms of prioritization effectiveness). In particular, we have applied PRIMA to the practical autonomous-vehicle testing in a large motor company, and the results on 4 real-world scene-recognition models in autonomous vehicles further confirm the practicability of PRIMA.

深度神经网络(Deep Neural Network, DNN)测试是保证深度神经网络质量最常用的方法之一。然而，标记测试输入以检查DNN预测的正确性是非常昂贵的，这可能在很大程度上影响DNN测试的效率，甚至影响DNN开发的整个过程。为了解决标记成本问题，我们提出了一种基于智能突变分析的DNN测试输入优先排序方法(PRIMA)，以便在有限的时间内更早地标记更多具有缺陷的测试输入，从而提高DNN测试的效率。PRIMA基于以下关键见解:能够杀死许多突变模型并使用许多突变输入产生不同预测结果的测试输入更有可能揭示DNN错误，因此应该优先考虑更高的优先级。在从我们设计的一系列模型和每个测试输入的输入突变规则中获得多个突变结果后，PRIMA进一步结合排序学习(一种解决排序问题的监督式机器学习)，将这些突变结果智能地组合在一起，实现有效的测试输入优先级排序。我们从五个维度(即不同的测试输入域、不同的DNN任务、不同的网络结构、不同类型的测试输入和不同的训练场景)仔细考虑了它们的多样性，基于36个受欢迎的主题进行了广泛的研究。我们的实验结果证明了PRIMA的有效性，显著优于最先进的方法(在优先级效率方面平均提高8.50%~131.01%)。特别是，我们将PRIMA应用于一家大型汽车公司的实际自动驾驶汽车测试中，并且在自动驾驶汽车的4个真实场景识别模型上的结果进一步证实了PRIMA的实用性。

{"title":"Prioritizing Test Inputs for Deep Neural Networks via Mutation Analysis","authors":"Zan Wang, Hanmo You, Junjie Chen, Yingyi Zhang, Xuyuan Dong, Wenbin Zhang","doi":"10.1109/ICSE43902.2021.00046","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00046","url":null,"abstract":"Deep Neural Network (DNN) testing is one of the most widely-used ways to guarantee the quality of DNNs. However, labeling test inputs to check the correctness of DNN prediction is very costly, which could largely affect the efficiency of DNN testing, even the whole process of DNN development. To relieve the labeling-cost problem, we propose a novel test input prioritization approach (called PRIMA) for DNNs via intelligent mutation analysis in order to label more bug-revealing test inputs earlier for a limited time, which facilitates to improve the efficiency of DNN testing. PRIMA is based on the key insight: a test input that is able to kill many mutated models and produce different prediction results with many mutated inputs, is more likely to reveal DNN bugs, and thus it should be prioritized higher. After obtaining a number of mutation results from a series of our designed model and input mutation rules for each test input, PRIMA further incorporates learning-to-rank (a kind of supervised machine learning to solve ranking problems) to intelligently combine these mutation results for effective test input prioritization. We conducted an extensive study based on 36 popular subjects by carefully considering their diversity from five dimensions (i.e., different domains of test inputs, different DNN tasks, different network structures, different types of test inputs, and different training scenarios). Our experimental results demonstrate the effectiveness of PRIMA, significantly outperforming the state-of-the-art approaches (with the average improvement of 8.50%~131.01% in terms of prioritization effectiveness). In particular, we have applied PRIMA to the practical autonomous-vehicle testing in a large motor company, and the results on 4 real-world scene-recognition models in autonomous vehicles further confirm the practicability of PRIMA.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125770756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 60

IdBench: Evaluating Semantic Representations of Identifier Names in Source Code IdBench:评估源代码中标识符名称的语义表示

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00059

Yaza Wainakh, Moiz Rauf, Michael Pradel

Identifier names convey useful information about the intended semantics of code. Name-based program analyses use this information, e.g., to detect bugs, to predict types, and to improve the readability of code. At the core of name-based analyses are semantic representations of identifiers, e.g., in the form of learned embeddings. The high-level goal of such a representation is to encode whether two identifiers, e.g., len and size, are semantically similar. Unfortunately, it is currently unclear to what extent semantic representations match the semantic relatedness and similarity perceived by developers. This paper presents IdBench, the first benchmark for evaluating semantic representations against a ground truth created from thousands of ratings by 500 software developers. We use IdBench to study state-of-the-art embedding techniques proposed for natural language, an embedding technique specifically designed for source code, and lexical string distance functions. Our results show that the effectiveness of semantic representations varies significantly and that the best available embeddings successfully represent semantic relatedness. On the downside, no existing technique provides a satisfactory representation of semantic similarities, among other reasons because identifiers with opposing meanings are incorrectly considered to be similar, which may lead to fatal mistakes, e.g., in a refactoring tool. Studying the strengths and weaknesses of the different techniques shows that they complement each other. As a first step toward exploiting this complementarity, we present an ensemble model that combines existing techniques and that clearly outperforms the best available semantic representation.

标识符名称传递有关代码预期语义的有用信息。基于名称的程序分析使用这些信息，例如，检测错误，预测类型，并提高代码的可读性。基于名称的分析的核心是标识符的语义表示，例如，以学习嵌入的形式。这种表示的高级目标是对两个标识符(例如len和size)是否在语义上相似进行编码。不幸的是，目前还不清楚语义表示在多大程度上与开发人员感知到的语义相关性和相似性相匹配。本文介绍了IdBench，这是根据500名软件开发人员创建的数千个评级来评估语义表示的第一个基准。我们使用IdBench来研究为自然语言提出的最先进的嵌入技术，一种专门为源代码设计的嵌入技术，以及词法字符串距离函数。我们的研究结果表明，语义表示的有效性差异很大，并且可用的最佳嵌入成功地表示了语义相关性。缺点是，除了其他原因外，没有现有的技术能够提供令人满意的语义相似性表示，因为具有相反含义的标识符被错误地认为是相似的，这可能导致致命的错误，例如在重构工具中。研究不同技术的优缺点表明，它们是相辅相成的。作为开发这种互补性的第一步，我们提出了一个集成模型，该模型结合了现有技术，并且明显优于现有的最佳语义表示。

{"title":"IdBench: Evaluating Semantic Representations of Identifier Names in Source Code","authors":"Yaza Wainakh, Moiz Rauf, Michael Pradel","doi":"10.1109/ICSE43902.2021.00059","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00059","url":null,"abstract":"Identifier names convey useful information about the intended semantics of code. Name-based program analyses use this information, e.g., to detect bugs, to predict types, and to improve the readability of code. At the core of name-based analyses are semantic representations of identifiers, e.g., in the form of learned embeddings. The high-level goal of such a representation is to encode whether two identifiers, e.g., len and size, are semantically similar. Unfortunately, it is currently unclear to what extent semantic representations match the semantic relatedness and similarity perceived by developers. This paper presents IdBench, the first benchmark for evaluating semantic representations against a ground truth created from thousands of ratings by 500 software developers. We use IdBench to study state-of-the-art embedding techniques proposed for natural language, an embedding technique specifically designed for source code, and lexical string distance functions. Our results show that the effectiveness of semantic representations varies significantly and that the best available embeddings successfully represent semantic relatedness. On the downside, no existing technique provides a satisfactory representation of semantic similarities, among other reasons because identifiers with opposing meanings are incorrectly considered to be similar, which may lead to fatal mistakes, e.g., in a refactoring tool. Studying the strengths and weaknesses of the different techniques shows that they complement each other. As a first step toward exploiting this complementarity, we present an ensemble model that combines existing techniques and that clearly outperforms the best available semantic representation.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129869044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Playing Planning Poker in Crowds: Human Computation of Software Effort Estimates 在人群中玩计划扑克:软件工作量估算的人类计算

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00014

Mohammed Alhamed, Tim Storer

Reliable cost effective effort estimation remains a considerable challenge for software projects. Recent work has demonstrated that the popular Planning Poker practice can produce reliable estimates when undertaken within a software team of knowledgeable domain experts. However, the process depends on the availability of experts and can be time-consuming to perform, making it impractical for large scale or open source projects that may curate many thousands of outstanding tasks. This paper reports on a full study to investigate the feasibility of using crowd workers supplied with limited information about a task to provide comparably accurate estimates using Planning Poker. We describe the design of a Crowd Planning Poker (CPP) process implemented on Amazon Mechanical Turk and the results of a substantial set of trials, involving more than 5000 crowd workers and 39 diverse software tasks. Our results show that a carefully organised and selected crowd of workers can produce effort estimates that are of similar accuracy to those of a single expert.

对于软件项目来说，可靠的成本效益评估仍然是一个相当大的挑战。最近的工作已经证明，流行的Planning Poker实践在由知识渊博的领域专家组成的软件团队中进行时可以产生可靠的估计。然而，这个过程取决于专家的可用性，并且执行起来可能很耗时，这使得它对于可能管理数千个未完成任务的大规模或开源项目来说不切实际。本文报告了一项完整的研究，以调查使用提供有关任务的有限信息的人群工作人员使用计划扑克提供相对准确的估计的可行性。我们描述了在Amazon Mechanical Turk上实施的人群规划扑克(CPP)流程的设计，以及涉及5000多名人群工作人员和39种不同软件任务的大量试验的结果。我们的研究结果表明，精心组织和选择的工人群体可以产生与单个专家相似的准确性的工作量估计。

引用次数: 5