2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)最新文献_第2页

Duplicate Bug Report Detection Using Dual-Channel Convolutional Neural Networks 使用双通道卷积神经网络检测重复Bug报告

2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)

Pub Date : 2020-07-13 DOI: 10.1145/3387904.3389263

Jianjun He, Ling Xu, Meng Yan, Xin Xia, Yan Lei

Developers rely on bug reports to fix bugs. The bug reports are usually stored and managed in bug tracking systems. Due to the different expression habits, different reporters may use different expressions to describe the same bug in the bug tracking system. As a result, the bug tracking system often contains many duplicate bug reports. Automatically detecting these duplicate bug reports would save a large amount of effort for bug analysis. Prior studies have found that deep-learning technique is effective for duplicate bug report detection. Inspired by recent Natural Language Processing (NLP) research, in this paper, we propose a duplicate bug report detection approach based on Dual-Channel Convolutional Neural Networks (DC-CNN). We present a novel bug report pair representation, i.e., dual-channel matrix through concatenating two single-channel matrices representing bug reports. Such bug report pairs are fed to a CNN model to capture the correlated semantic relationships between bug reports. Then, our approach uses the association features to classify whether a pair of bug reports are duplicate or not. We evaluate our approach on three large datasets from three open-source projects, including Open Office, Eclipse, Net Beans and a larger combined dataset, and the accuracy of classification reaches 0.9429, 0.9685, 0.9534, 0.9552 respectively. Such performance outperforms the two state-of-the-art approaches which also use deep-learning techniques. The results indicate that our dual-channel matrix representation is effective for duplicate bug report detection.

开发人员依靠bug报告来修复bug。错误报告通常在错误跟踪系统中存储和管理。由于表达习惯的不同，在bug跟踪系统中，不同的记者可能会使用不同的表达来描述同一个bug。因此，bug跟踪系统经常包含许多重复的bug报告。自动检测这些重复的错误报告将为错误分析节省大量的工作。已有研究发现，深度学习技术对重复bug报告检测是有效的。受最近自然语言处理(NLP)研究的启发，本文提出了一种基于双通道卷积神经网络(DC-CNN)的重复错误报告检测方法。我们提出了一种新的bug报告对表示法，即双通道矩阵，通过连接两个单通道矩阵来表示bug报告。这些错误报告对被馈送到CNN模型中，以捕获错误报告之间的相关语义关系。然后，我们的方法使用关联特征对一对错误报告是否重复进行分类。我们在Open Office、Eclipse、Net Beans三个开源项目的大型数据集和一个更大的组合数据集上对我们的方法进行了评估，分类准确率分别达到0.9429、0.9685、0.9534、0.9552。这种性能优于同样使用深度学习技术的两种最先进的方法。结果表明，我们的双通道矩阵表示对于重复错误报告检测是有效的。

{"title":"Duplicate Bug Report Detection Using Dual-Channel Convolutional Neural Networks","authors":"Jianjun He, Ling Xu, Meng Yan, Xin Xia, Yan Lei","doi":"10.1145/3387904.3389263","DOIUrl":"https://doi.org/10.1145/3387904.3389263","url":null,"abstract":"Developers rely on bug reports to fix bugs. The bug reports are usually stored and managed in bug tracking systems. Due to the different expression habits, different reporters may use different expressions to describe the same bug in the bug tracking system. As a result, the bug tracking system often contains many duplicate bug reports. Automatically detecting these duplicate bug reports would save a large amount of effort for bug analysis. Prior studies have found that deep-learning technique is effective for duplicate bug report detection. Inspired by recent Natural Language Processing (NLP) research, in this paper, we propose a duplicate bug report detection approach based on Dual-Channel Convolutional Neural Networks (DC-CNN). We present a novel bug report pair representation, i.e., dual-channel matrix through concatenating two single-channel matrices representing bug reports. Such bug report pairs are fed to a CNN model to capture the correlated semantic relationships between bug reports. Then, our approach uses the association features to classify whether a pair of bug reports are duplicate or not. We evaluate our approach on three large datasets from three open-source projects, including Open Office, Eclipse, Net Beans and a larger combined dataset, and the accuracy of classification reaches 0.9429, 0.9685, 0.9534, 0.9552 respectively. Such performance outperforms the two state-of-the-art approaches which also use deep-learning techniques. The results indicate that our dual-channel matrix representation is effective for duplicate bug report detection.","PeriodicalId":231095,"journal":{"name":"2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129528360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 36

Staged Tree Matching for Detecting Code Move across Files 用于检测跨文件的代码移动的阶段树匹配

2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)

Pub Date : 2020-07-13 DOI: 10.1145/3387904.3389289

Akira Fujimoto, Yoshiki Higo, Junnosuke Matsumoto, S. Kusumoto

In software development, developers often need to understand source code differences in their activities. GumTree is a tool that detects tree-based source code differences. GumTree constructs abstract syntax trees from the source code before and after a given change, and then, it identifies inserted/deleted/moved subtrees and updated nodes. Source code differences are detected based on the four kinds of information in GumTree. However, GumTree calculates the difference for each file individually, so that it cannot detect moves of code fragments across files. In this research, we propose (1) to construct a single abstract syntax tree from all source files included in a project and (2) to perform a staged tree matching to detect across-file code moves efficiently and accurately. We have already conducted a pilot experiment on open source projects with our technique. As a result, we were able to detect code moves across files in all the projects, and the number of such code moves was 76,600 in total.

在软件开发中，开发人员通常需要了解其活动中的源代码差异。GumTree是一个检测基于树的源代码差异的工具。在给定更改之前和之后，GumTree从源代码构建抽象语法树，然后，它识别插入/删除/移动的子树和更新的节点。根据GumTree中的四种信息检测源代码差异。然而，GumTree单独计算每个文件的差异，因此它无法检测到跨文件的代码片段移动。在本研究中，我们提出(1)从项目中包含的所有源文件构建单个抽象语法树;(2)执行分阶段树匹配以高效准确地检测跨文件的代码移动。我们已经用我们的技术在开源项目上进行了试点实验。因此，我们能够检测到所有项目中跨文件的代码移动，此类代码移动的总数为76,600。

引用次数: 1

What Drives the Reading Order of Programmers? An Eye Tracking Study 是什么驱动着程序员的阅读顺序?眼动追踪研究

2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)

Pub Date : 2020-07-13 DOI: 10.1145/3387904.3389279

Norman Peitek, J. Siegmund, S. Apel

Background: The way how programmers comprehend source code depends on several factors, including the source code itself and the programmer. Recent studies showed that novice programmers tend to read source code more like natural language text, whereas experts tend to follow the program execution flow. But, it is unknown how the linearity of source code and the comprehension strategy influence programmers' linearity of reading order. Objective: We replicate two previous studies with the aim of additionally providing empirical evidence on the influencing effects of linearity of source code and programmers' comprehension strategy on linearity of reading order. Methods: To understand the effects of linearity of source code on reading order, we conducted a non-exact replication of studies by Busjahn et al. and Peachock et al., which compared the reading order of novice and expert programmers. Like the original studies, we used an eye-tracker to record the eye movements of participants (12 novice and 19 intermediate programmers). Results: In line with Busjahn et al. (but different from Peachock et al.), we found that experience modulates the reading behavior of participants. However, the linearity of source code has an even stronger effect on reading order than experience, whereas the comprehension strategy has a minor effect. Implications: Our results demonstrate that studies on the reading behavior of programmers must carefully select source code snippets to control the influence of confounding factors. Furthermore, we identify a need for further studies on how programmers should structure source code to align it with their natural reading behavior to ease program comprehension.

背景:程序员理解源代码的方式取决于几个因素，包括源代码本身和程序员。最近的研究表明，新手程序员倾向于像阅读自然语言文本一样阅读源代码，而专家则倾向于遵循程序执行流程。但是，源代码的线性度和理解策略如何影响程序员阅读顺序的线性度是未知的。目的:我们重复前人的两项研究，旨在为源代码线性度和程序员理解策略对阅读顺序线性度的影响提供经验证据。方法:为了了解源代码线性度对阅读顺序的影响，我们对Busjahn等人和Peachock等人的研究进行了非精确复制，这些研究比较了新手和专家程序员的阅读顺序。与最初的研究一样，我们使用眼动仪记录参与者(12名新手和19名中级程序员)的眼球运动。结果:与Busjahn等人(但与Peachock等人不同)的研究结果一致，我们发现体验调节了参与者的阅读行为。然而，源代码的线性对阅读顺序的影响比经验更大，而理解策略的影响较小。启示:我们的研究结果表明，对程序员阅读行为的研究必须仔细选择源代码片段，以控制混杂因素的影响。此外，我们确定需要进一步研究程序员应该如何构建源代码以使其与他们的自然阅读行为保持一致，以简化程序理解。

{"title":"What Drives the Reading Order of Programmers? An Eye Tracking Study","authors":"Norman Peitek, J. Siegmund, S. Apel","doi":"10.1145/3387904.3389279","DOIUrl":"https://doi.org/10.1145/3387904.3389279","url":null,"abstract":"Background: The way how programmers comprehend source code depends on several factors, including the source code itself and the programmer. Recent studies showed that novice programmers tend to read source code more like natural language text, whereas experts tend to follow the program execution flow. But, it is unknown how the linearity of source code and the comprehension strategy influence programmers' linearity of reading order. Objective: We replicate two previous studies with the aim of additionally providing empirical evidence on the influencing effects of linearity of source code and programmers' comprehension strategy on linearity of reading order. Methods: To understand the effects of linearity of source code on reading order, we conducted a non-exact replication of studies by Busjahn et al. and Peachock et al., which compared the reading order of novice and expert programmers. Like the original studies, we used an eye-tracker to record the eye movements of participants (12 novice and 19 intermediate programmers). Results: In line with Busjahn et al. (but different from Peachock et al.), we found that experience modulates the reading behavior of participants. However, the linearity of source code has an even stronger effect on reading order than experience, whereas the comprehension strategy has a minor effect. Implications: Our results demonstrate that studies on the reading behavior of programmers must carefully select source code snippets to control the influence of confounding factors. Furthermore, we identify a need for further studies on how programmers should structure source code to align it with their natural reading behavior to ease program comprehension.","PeriodicalId":231095,"journal":{"name":"2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132653026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Linguistic Documentation of Software History 软件历史的语言文档

2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)

Pub Date : 2020-07-13 DOI: 10.1145/3387904.3389288

Miroslav Tushev, Anas Mahmoud

Open Source Software (OSS) projects start with an initial vocabulary, often determined by the first generation of developers. This vocabulary, embedded in code identifier names and internal code comments, goes through multiple rounds of change, influenced by the interrelated patterns of human (e.g., developers joining and departing) and system (e.g., maintenance activities) interactions. Capturing the dynamics of this change is crucial for understanding and synthesizing code changes over time. However, existing code evolution analysis tools, available in modern version control systems such as GitHub and SourceForge, often overlook the linguistic aspects of code evolution. To bridge this gap, in this paper, we propose to study code evolution in OSS projects through the lens of developers' language, also known as code lexicon. Our analysis is conducted using 32 OSS projects sampled from a broad range of application domains. Our results show that different maintenance activities impact code lexicon differently. These insights lay out a preliminary foundation for modeling the linguistic history of OSS projects. In the long run, this foundation will be utilized to provide support for basic program comprehension tasks and help researchers gain new insights into the complex interplay between linguistic change and various system and human aspects of OSS development.

开源软件(OSS)项目从最初的词汇表开始，通常由第一代开发人员确定。这个嵌入在代码标识符名称和内部代码注释中的词汇表，受到人类(例如，开发人员的加入和离开)和系统(例如，维护活动)交互的相关模式的影响，经历了多轮更改。捕捉这种变化的动态对于理解和综合随时间变化的代码变化是至关重要的。然而，现有的代码演变分析工具，在现代版本控制系统中可用，如GitHub和SourceForge，经常忽略代码演变的语言方面。为了弥补这一差距，在本文中，我们建议通过开发人员的语言(也称为代码词典)来研究OSS项目中的代码演变。我们的分析是使用从广泛的应用领域中抽取的32个OSS项目进行的。我们的结果表明，不同的维护活动对代码词典的影响是不同的。这些见解为OSS项目的语言历史建模奠定了初步的基础。从长远来看，这个基础将被用来为基本的程序理解任务提供支持，并帮助研究人员获得新的见解，以了解语言变化与OSS开发的各种系统和人类方面之间复杂的相互作用。

{"title":"Linguistic Documentation of Software History","authors":"Miroslav Tushev, Anas Mahmoud","doi":"10.1145/3387904.3389288","DOIUrl":"https://doi.org/10.1145/3387904.3389288","url":null,"abstract":"Open Source Software (OSS) projects start with an initial vocabulary, often determined by the first generation of developers. This vocabulary, embedded in code identifier names and internal code comments, goes through multiple rounds of change, influenced by the interrelated patterns of human (e.g., developers joining and departing) and system (e.g., maintenance activities) interactions. Capturing the dynamics of this change is crucial for understanding and synthesizing code changes over time. However, existing code evolution analysis tools, available in modern version control systems such as GitHub and SourceForge, often overlook the linguistic aspects of code evolution. To bridge this gap, in this paper, we propose to study code evolution in OSS projects through the lens of developers' language, also known as code lexicon. Our analysis is conducted using 32 OSS projects sampled from a broad range of application domains. Our results show that different maintenance activities impact code lexicon differently. These insights lay out a preliminary foundation for modeling the linguistic history of OSS projects. In the long run, this foundation will be utilized to provide support for basic program comprehension tasks and help researchers gain new insights into the complex interplay between linguistic change and various system and human aspects of OSS development.","PeriodicalId":231095,"journal":{"name":"2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)","volume":"208 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116662326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Combining Biometric Data with Focused Document Types Classifies a Success of Program Comprehension 将生物识别数据与重点文档类型相结合是程序理解的成功分类

2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)

Pub Date : 2020-07-13 DOI: 10.1145/3387904.3389291

Toyomi Ishida, H. Uwano, Yoshiharu Ikutani

Program comprehension is one of the important cognitive processes in software maintenance. The process typically involves diverse mental activities such as understanding of source code, library usages, and requirements. Systematic supports would be improved if the supports can be aware of such fine-grained mental activities during program comprehension. Here we aim to investigate whether biometric data can be varied according to such mental activity classes and conduct an experiment with program comprehension tasks involving multiple documents. As a result, we successfully classified the success/failure of the tasks at 85.2% from electroencephalogram (EEG) combined with focused document types. This result suggests that our metrics based on EEG and focused document types might be beneficial to detect developers, diverse mental activities triggered by different documents.

程序理解是软件维护过程中重要的认知过程之一。这个过程通常包括各种各样的心理活动，比如对源代码、库用法和需求的理解。如果支持可以在程序理解过程中意识到这种细粒度的心理活动，那么系统支持将得到改进。在这里，我们的目的是研究生物特征数据是否可以根据这些心理活动类别而变化，并进行涉及多个文件的程序理解任务的实验。结果表明，通过脑电图(EEG)结合重点文档类型，我们成功地将任务的成功/失败分类为85.2%。这个结果表明，我们基于EEG和重点文档类型的度量可能有助于检测开发人员由不同文档触发的各种心理活动。

引用次数: 1

Refactoring Android-specific Energy Smells: A Plugin for Android Studio 重构Android特定的能量气味:Android Studio的一个插件

2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)

Pub Date : 2020-07-13 DOI: 10.1145/3387904.3389298

Emanuele Iannone, Fabiano Pecorelli, D. D. Nucci, Fabio Palomba, A. D. Lucia

Mobile applications are major means to perform daily actions, including social and emergency connectivity. However, their usability is threatened by energy consumption that may be impacted by code smells i.e., symptoms of bad implementation and design practices. In particular, researchers derived a set of mobile-specific code smells resulting in increased energy consumption of mobile apps and removing such smells through refactoring can mitigate the problem. In this paper, we extend and revise ADOCTOR, a tool that we previously implemented to identify energy-related smells. On the one hand, we present and implement automated refactoring solutions to those smells. On the other hand, we make the tool completely open-source and available in Android Studio as a plugin published in the official store. The video showing the tool in action is available at: https://www.youtube.com/watch?v=1c2EhVXiKis

移动应用程序是执行日常操作的主要手段，包括社交和紧急连接。然而，它们的可用性受到能量消耗的威胁，而能量消耗可能受到代码气味的影响，即不良实现和设计实践的症状。特别是，研究人员得出了一组特定于移动设备的代码气味，导致移动应用程序的能耗增加，通过重构消除这些气味可以缓解这个问题。在本文中，我们扩展和修改了adotor，这是我们之前实现的用于识别与能量相关的气味的工具。一方面，我们提出并实现了针对这些气味的自动化重构解决方案。另一方面，我们使这个工具完全开源，并在Android Studio中作为插件发布在官方商店中。展示该工具实际操作的视频可在:https://www.youtube.com/watch?v=1c2EhVXiKis上获得

引用次数: 12

Unified Configuration Setting Access in Configuration Management Systems 配置管理系统统一配置设置接入

2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)

Pub Date : 2020-07-13 DOI: 10.1145/3387904.3389257

Markus Raab, B. Denner, Stefan Hahnenberg, Jürgen Cito

The behavior of software is often governed by a large set of configuration settings, distributed over several stacks in the software system. These settings are often manifested as plain text files that exhibit different formats and syntax. Configuration management systems are introduced to manage the complexity of provisioning and distributing configuration in large scale software. Globally patching configuration settings in these systems requires, however, introducing text manipulation or external templating mechanisms, that paradoxically lead to increased complexity and, eventually, to misconfigurations. These issues manifest through crashes or bugs that are often only discovered at runtime. We introduce a framework called Elektra, which integrates a centralized configuration space into configuration management systems to avoid syntax errors and avert the overriding of default values, to increase developer productivity. Elektra enables mounting different configuration files into a common, globally shared data structure to abstract away from the intricate details of file formats and configuration syntax and introduce a unified way to specify and patch configuration settings as key/value pairs. In this work, we integrate Elektra in the configuration management tool Puppet. Additionally, we present a user study with 14 developers showing that Elektra enables significant productivity improvements over existing configuration management concepts. Our study participants performed significantly faster using Elektra in solving three representative scenarios that involve configuration manipulation, compared to other general-purpose configuration manipulation methods.

软件的行为通常由一组分布在软件系统中的多个堆栈上的配置设置控制。这些设置通常表现为显示不同格式和语法的纯文本文件。引入配置管理系统是为了管理大型软件中配置供应和分配的复杂性。然而，对这些系统中的全局配置设置打补丁需要引入文本操作或外部模板机制，这反而会增加复杂性，并最终导致配置错误。这些问题表现为通常只在运行时发现的崩溃或错误。我们引入了一个名为Elektra的框架，它将集中的配置空间集成到配置管理系统中，以避免语法错误并避免覆盖默认值，从而提高开发人员的工作效率。Elektra可以将不同的配置文件挂载到一个通用的、全局共享的数据结构中，从而从文件格式和配置语法的复杂细节中抽象出来，并引入一种统一的方式来指定和修补配置设置作为键/值对。在这项工作中，我们将Elektra集成到配置管理工具Puppet中。此外，我们还提供了一项由14名开发人员组成的用户研究，显示Elektra在现有配置管理概念的基础上显著提高了生产力。与其他通用配置操作方法相比，我们的研究参与者在解决涉及配置操作的三个代表性场景时使用Elektra的速度要快得多。

{"title":"Unified Configuration Setting Access in Configuration Management Systems","authors":"Markus Raab, B. Denner, Stefan Hahnenberg, Jürgen Cito","doi":"10.1145/3387904.3389257","DOIUrl":"https://doi.org/10.1145/3387904.3389257","url":null,"abstract":"The behavior of software is often governed by a large set of configuration settings, distributed over several stacks in the software system. These settings are often manifested as plain text files that exhibit different formats and syntax. Configuration management systems are introduced to manage the complexity of provisioning and distributing configuration in large scale software. Globally patching configuration settings in these systems requires, however, introducing text manipulation or external templating mechanisms, that paradoxically lead to increased complexity and, eventually, to misconfigurations. These issues manifest through crashes or bugs that are often only discovered at runtime. We introduce a framework called Elektra, which integrates a centralized configuration space into configuration management systems to avoid syntax errors and avert the overriding of default values, to increase developer productivity. Elektra enables mounting different configuration files into a common, globally shared data structure to abstract away from the intricate details of file formats and configuration syntax and introduce a unified way to specify and patch configuration settings as key/value pairs. In this work, we integrate Elektra in the configuration management tool Puppet. Additionally, we present a user study with 14 developers showing that Elektra enables significant productivity improvements over existing configuration management concepts. Our study participants performed significantly faster using Elektra in solving three representative scenarios that involve configuration manipulation, compared to other general-purpose configuration manipulation methods.","PeriodicalId":231095,"journal":{"name":"2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123894788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Human Study of Comprehension and Code Summarization 人类对理解和代码总结的研究

2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)

Pub Date : 2020-07-13 DOI: 10.1145/3387904.3389258

Sean Stapleton, Yashmeet Gambhir, Alexander LeClair, Zachary Eberhart, Westley Weimer, Kevin Leach, Yu Huang

Software developers spend a great deal of time reading and understanding code that is poorly-documented, written by other developers, or developed using differing styles. During the past decade, researchers have investigated techniques for automatically documenting code to improve comprehensibility. In particular, recent advances in deep learning have led to sophisticated summary generation techniques that convert functions or methods to simple English strings that succinctly describe that code's behavior. However, automatic summarization techniques are assessed using internal metrics such as BLEU scores, which measure natural language properties in translational models, or ROUGE scores, which measure overlap with human-written text. Unfortunately, these metrics do not necessarily capture how machine-generated code summaries actually affect human comprehension or developer productivity. We conducted a human study involving both university students and professional developers (n = 45). Participants reviewed Java methods and summaries and answered established program comprehension questions. In addition, participants completed coding tasks given summaries as specifications. Critically, the experiment controlled the source of the summaries: for a given method, some participants were shown human-written text and some were shown machine-generated text. We found that participants performed significantly better (p = 0.029) using human-written summaries versus machine-generated summaries. However, we found no evidence to support that participants perceive human- and machine-generated summaries to have different qualities. In addition, participants' performance showed no correlation with the BLEU and ROUGE scores often used to assess the quality of machine-generated summaries. These results suggest a need for revised metrics to assess and guide automatic summarization techniques.

软件开发人员花费大量时间阅读和理解文档记录不佳、由其他开发人员编写或使用不同风格开发的代码。在过去的十年中，研究人员研究了自动记录代码以提高可理解性的技术。特别是，深度学习的最新进展导致了复杂的摘要生成技术，可以将函数或方法转换为简单的英文字符串，简洁地描述代码的行为。然而，自动摘要技术是使用内部指标来评估的，比如BLEU分数，它测量翻译模型中的自然语言属性，或者ROUGE分数，它测量与人类书面文本的重叠。不幸的是，这些指标并不一定能够捕捉到机器生成的代码摘要实际上是如何影响人类的理解能力或开发人员的生产力的。我们进行了一项涉及大学生和专业开发人员的人体研究(n = 45)。参与者回顾了Java方法和摘要，并回答了既定的程序理解问题。此外，参与者还完成了给出摘要作为规范的编码任务。关键是，实验控制了摘要的来源:对于给定的方法，一些参与者看到的是人类编写的文本，一些参与者看到的是机器生成的文本。我们发现，与机器生成的摘要相比，使用人工编写的摘要的参与者表现明显更好(p = 0.029)。然而，我们没有发现证据支持参与者认为人类和机器生成的摘要具有不同的质量。此外，参与者的表现与BLEU和ROUGE分数没有相关性，BLEU和ROUGE分数通常用于评估机器生成摘要的质量。这些结果表明需要修订度量来评估和指导自动总结技术。

{"title":"A Human Study of Comprehension and Code Summarization","authors":"Sean Stapleton, Yashmeet Gambhir, Alexander LeClair, Zachary Eberhart, Westley Weimer, Kevin Leach, Yu Huang","doi":"10.1145/3387904.3389258","DOIUrl":"https://doi.org/10.1145/3387904.3389258","url":null,"abstract":"Software developers spend a great deal of time reading and understanding code that is poorly-documented, written by other developers, or developed using differing styles. During the past decade, researchers have investigated techniques for automatically documenting code to improve comprehensibility. In particular, recent advances in deep learning have led to sophisticated summary generation techniques that convert functions or methods to simple English strings that succinctly describe that code's behavior. However, automatic summarization techniques are assessed using internal metrics such as BLEU scores, which measure natural language properties in translational models, or ROUGE scores, which measure overlap with human-written text. Unfortunately, these metrics do not necessarily capture how machine-generated code summaries actually affect human comprehension or developer productivity. We conducted a human study involving both university students and professional developers (n = 45). Participants reviewed Java methods and summaries and answered established program comprehension questions. In addition, participants completed coding tasks given summaries as specifications. Critically, the experiment controlled the source of the summaries: for a given method, some participants were shown human-written text and some were shown machine-generated text. We found that participants performed significantly better (p = 0.029) using human-written summaries versus machine-generated summaries. However, we found no evidence to support that participants perceive human- and machine-generated summaries to have different qualities. In addition, participants' performance showed no correlation with the BLEU and ROUGE scores often used to assess the quality of machine-generated summaries. These results suggest a need for revised metrics to assess and guide automatic summarization techniques.","PeriodicalId":231095,"journal":{"name":"2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124070178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 44

Evaluating a Visual Approach for Understanding JavaScript Source Code 评估理解JavaScript源代码的可视化方法

2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)

Pub Date : 2020-07-13 DOI: 10.1145/3387904.3389275

M. Dias, D. Orellana, S. Vidal, Leonel Merino, Alexandre Bergel

To characterize the building blocks of a legacy software system (e.g., structure, dependencies), programmers usually spend a long time navigating its source code. Yet, modern integrated development environments (IDEs) do not provide appropriate means to efficiently achieve complex software comprehension tasks. To deal with this unfulfilled need, we present Hunter, a tool for the visualization of JavaScript applications. Hunter visualizes source code through a set of coordinated views that include a node-link diagram that depicts the dependencies among the components of a system, and a treemap that helps programmers to orientate when navigating its structure. In this paper, we report on a controlled experiment that evaluates Hunter. We asked 16 participants to solve a set of software comprehension tasks, and assessed their effectiveness in terms of (i) user performance (i.e., completion time, accuracy, and attention), and (ii) user experience (i.e., emotions, usability). We found that when using Hunter programmers required significantly less time to complete various software comprehension tasks and achieved a significantly higher accuracy. We also found that the node-link diagram panel of Hunter gets most of the attention of programmers, whereas the source code panel does so in Visual Studio Code. Moreover, programmers considered that Hunter exhibits a good user experience.

为了描述遗留软件系统的构建块(例如，结构，依赖关系)，程序员通常花费很长时间浏览其源代码。然而，现代集成开发环境(ide)并没有提供适当的方法来有效地实现复杂的软件理解任务。为了解决这个未满足的需求，我们提出了Hunter，一个JavaScript应用程序的可视化工具。Hunter通过一组协调的视图将源代码可视化，这些视图包括一个描述系统组件之间依赖关系的节点链接图，以及一个帮助程序员在导航系统结构时定位的树状图。在本文中，我们报告了一个评估亨特的对照实验。我们要求16名参与者解决一组软件理解任务，并根据(i)用户性能(即，完成时间，准确性和注意力)和(ii)用户体验(即，情感，可用性)评估其有效性。我们发现，当使用Hunter时，程序员需要更少的时间来完成各种软件理解任务，并且获得了更高的准确性。我们还发现，Hunter的节点链接图面板吸引了程序员的大部分注意力，而在Visual Studio code中，源代码面板则是如此。此外，程序员认为Hunter展示了良好的用户体验。

{"title":"Evaluating a Visual Approach for Understanding JavaScript Source Code","authors":"M. Dias, D. Orellana, S. Vidal, Leonel Merino, Alexandre Bergel","doi":"10.1145/3387904.3389275","DOIUrl":"https://doi.org/10.1145/3387904.3389275","url":null,"abstract":"To characterize the building blocks of a legacy software system (e.g., structure, dependencies), programmers usually spend a long time navigating its source code. Yet, modern integrated development environments (IDEs) do not provide appropriate means to efficiently achieve complex software comprehension tasks. To deal with this unfulfilled need, we present Hunter, a tool for the visualization of JavaScript applications. Hunter visualizes source code through a set of coordinated views that include a node-link diagram that depicts the dependencies among the components of a system, and a treemap that helps programmers to orientate when navigating its structure. In this paper, we report on a controlled experiment that evaluates Hunter. We asked 16 participants to solve a set of software comprehension tasks, and assessed their effectiveness in terms of (i) user performance (i.e., completion time, accuracy, and attention), and (ii) user experience (i.e., emotions, usability). We found that when using Hunter programmers required significantly less time to complete various software comprehension tasks and achieved a significantly higher accuracy. We also found that the node-link diagram panel of Hunter gets most of the attention of programmers, whereas the source code panel does so in Visual Studio Code. Moreover, programmers considered that Hunter exhibits a good user experience.","PeriodicalId":231095,"journal":{"name":"2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123327974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Testing of Mobile Applications in the Wild: A large-Scale Empirical Study on Android Apps 移动应用的野外测试:Android应用的大规模实证研究

2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)

Pub Date : 2020-07-13 DOI: 10.1145/3387904.3389256

Fabiano Pecorelli, Gemma Catolino, F. Ferrucci, A. D. Lucia, Fabio Palomba

Nowadays, mobile applications (a.k.a., apps) are used by over two billion users for every type of need, including social and emergency connectivity. Their pervasiveness in today's world has inspired the software testing research community in devising approaches to allow developers to better test their apps and improve the quality of the tests being developed. In spite of this research effort, we still notice a lack of empirical studies aiming at assessing the actual quality of test cases developed by mobile developers: this perspective could provide evidence-based findings on the current status of testing in the wild as well as on the future research directions in the field. As such, we performed a large-scale empirical study targeting 1,780 open-source Android apps and aiming at assessing (1) the extent to which these apps are actually tested, (2) how well-designed are the available tests, and (3) what is their effectiveness. The key results of our study show that mobile developers still tend not to properly test their apps. Furthermore, we discovered that the test cases of the considered apps have a low (i) design quality, both in terms of test code metrics and test smells, and (ii) effectiveness when considering code coverage as well as assertion density.

如今，超过20亿用户使用移动应用程序(又称应用程序)来满足各种需求，包括社交和紧急连接。它们在当今世界的普及激发了软件测试研究社区设计方法，使开发人员能够更好地测试他们的应用程序，并提高正在开发的测试的质量。尽管进行了这些研究，我们仍然注意到缺乏旨在评估移动开发者开发的测试用例的实际质量的实证研究:这种观点可以为野外测试的现状以及该领域未来的研究方向提供基于证据的发现。因此，我们针对1780个开源Android应用程序进行了大规模的实证研究，旨在评估(1)这些应用程序实际测试的程度，(2)可用测试的设计程度，以及(3)它们的有效性。我们研究的主要结果表明，手机开发者仍然倾向于不适当地测试他们的应用。此外，我们发现所考虑的应用程序的测试用例具有较低的(i)设计质量，无论是在测试代码度量和测试气味方面，以及(ii)在考虑代码覆盖率和断言密度时的有效性。

{"title":"Testing of Mobile Applications in the Wild: A large-Scale Empirical Study on Android Apps","authors":"Fabiano Pecorelli, Gemma Catolino, F. Ferrucci, A. D. Lucia, Fabio Palomba","doi":"10.1145/3387904.3389256","DOIUrl":"https://doi.org/10.1145/3387904.3389256","url":null,"abstract":"Nowadays, mobile applications (a.k.a., apps) are used by over two billion users for every type of need, including social and emergency connectivity. Their pervasiveness in today's world has inspired the software testing research community in devising approaches to allow developers to better test their apps and improve the quality of the tests being developed. In spite of this research effort, we still notice a lack of empirical studies aiming at assessing the actual quality of test cases developed by mobile developers: this perspective could provide evidence-based findings on the current status of testing in the wild as well as on the future research directions in the field. As such, we performed a large-scale empirical study targeting 1,780 open-source Android apps and aiming at assessing (1) the extent to which these apps are actually tested, (2) how well-designed are the available tests, and (3) what is their effectiveness. The key results of our study show that mobile developers still tend not to properly test their apps. Furthermore, we discovered that the test cases of the considered apps have a low (i) design quality, both in terms of test code metrics and test smells, and (ii) effectiveness when considering code coverage as well as assertion density.","PeriodicalId":231095,"journal":{"name":"2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)","volume":"60 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120935440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20