Marco Raglianti, Csaba Nagy, Roberto Minelli, Michele Lanza
Modern communication platforms used in software development host daily conversations among developers and users about a wide range of topics pertaining to software systems, such as language features, APIs, code artifacts like classes and methods, design patterns, usage examples, code reviews, bug reporting and fixing. Discord servers are one of these virtual community hubs that have seen a steep rise in popularity, as coordination and aggregation means for communities of developers. Although Discord supports filter-based search functionalities, the sheer volume, velocity, and small granularity of single messages make it hard to find useful results, let alone complete discussions revolving around particular themes. One reason is that the concept of a discussion, which we call a conversation, does not exist as an explicit concept. We argue that extracting and analyzing such conversations can be used fruitfully to aid program comprehension. We present an approach that reconstructs the conversations that take place on a software community Discord server, focusing on software-related conversations: Our approach binds the conversations to the discussed artifacts. Leveraging our approach, we built a tool that enables the interactive exploration of the conversations' contents. We illustrate its usefulness through a number of examples that highlight how the insights obtained serve as an additional form of software documentation and program comprehension aid.
{"title":"Using Discord Conversations as Program Comprehension Aid","authors":"Marco Raglianti, Csaba Nagy, Roberto Minelli, Michele Lanza","doi":"10.1145/3524610.3528388","DOIUrl":"https://doi.org/10.1145/3524610.3528388","url":null,"abstract":"Modern communication platforms used in software development host daily conversations among developers and users about a wide range of topics pertaining to software systems, such as language features, APIs, code artifacts like classes and methods, design patterns, usage examples, code reviews, bug reporting and fixing. Discord servers are one of these virtual community hubs that have seen a steep rise in popularity, as coordination and aggregation means for communities of developers. Although Discord supports filter-based search functionalities, the sheer volume, velocity, and small granularity of single messages make it hard to find useful results, let alone complete discussions revolving around particular themes. One reason is that the concept of a discussion, which we call a conversation, does not exist as an explicit concept. We argue that extracting and analyzing such conversations can be used fruitfully to aid program comprehension. We present an approach that reconstructs the conversations that take place on a software community Discord server, focusing on software-related conversations: Our approach binds the conversations to the discussed artifacts. Leveraging our approach, we built a tool that enables the interactive exploration of the conversations' contents. We illustrate its usefulness through a number of examples that highlight how the insights obtained serve as an additional form of software documentation and program comprehension aid.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127450971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Just-In-Time (JIT) defect prediction aims to automatically predict whether a commit is defective or not, and has been widely studied in recent years. In general, most studies can be classified into two categories: 1) simple models using traditional machine learning classifiers with hand-crafted features, and 2) complex models using deep learning techniques to automatically extract features. Hand-crafted features used by simple models are based on expert knowledge but may not fully represent the semantic meaning of the commits. On the other hand, deep learning-based features used by complex models represent the semantic meaning of commits but may not reflect useful expert knowledge. Simple models and complex models seem complementary to each other to some extent. To utilize the advantages of both simple and complex models, we propose a combined model namely SimCom by fusing the prediction scores of one simple and one complex model. The experimental results show that our approach can significantly outperform the state-of-the-art by 6.0-18.1%. In addition, our experimental results confirm that the simple model and complex model are complementary to each other.
{"title":"Simple or Complex? Together for a More Accurate Just-In-Time Defect Predictor","authors":"Xin Zhou, Donggyun Han, David Lo","doi":"10.1145/3524610.3527910","DOIUrl":"https://doi.org/10.1145/3524610.3527910","url":null,"abstract":"Just-In-Time (JIT) defect prediction aims to automatically predict whether a commit is defective or not, and has been widely studied in recent years. In general, most studies can be classified into two categories: 1) simple models using traditional machine learning classifiers with hand-crafted features, and 2) complex models using deep learning techniques to automatically extract features. Hand-crafted features used by simple models are based on expert knowledge but may not fully represent the semantic meaning of the commits. On the other hand, deep learning-based features used by complex models represent the semantic meaning of commits but may not reflect useful expert knowledge. Simple models and complex models seem complementary to each other to some extent. To utilize the advantages of both simple and complex models, we propose a combined model namely SimCom by fusing the prediction scores of one simple and one complex model. The experimental results show that our approach can significantly outperform the state-of-the-art by 6.0-18.1%. In addition, our experimental results confirm that the simple model and complex model are complementary to each other.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128928513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Analyzing heap dumps containing complex dynamic data structures is essential when debugging modern software systems. However, existing tools for visualizing memory graphs can neither deal with corrupt structures such as binary trees exhibiting cycles, nor do they offer adequate abstractions when being confronted with large heaps. This paper presents MGE (Memory Graph Explorer), a memory analyzer and visualizer that combines a novel memory graph abstraction with an interactive visualization. MGE borrows ideas from separation logic and shape analysis to reveal relationships between memory nodes, name recognized structures such as doubly-linked lists and binary trees, and summarize complex structures. This summarization works for corrupt data structures, too, and is particularly powerful for large, nested structures due to its support for interactive (un)folding. MGE's utility for aiding program comprehension is illustrated by real-world and textbook examples and contrasted with existing debuggers.
{"title":"Shape-Analysis Driven Memory Graph Visualization","authors":"Jan H. Boockmann, Gerald Lüttgen","doi":"10.1145/3524610.3527913","DOIUrl":"https://doi.org/10.1145/3524610.3527913","url":null,"abstract":"Analyzing heap dumps containing complex dynamic data structures is essential when debugging modern software systems. However, existing tools for visualizing memory graphs can neither deal with corrupt structures such as binary trees exhibiting cycles, nor do they offer adequate abstractions when being confronted with large heaps. This paper presents MGE (Memory Graph Explorer), a memory analyzer and visualizer that combines a novel memory graph abstraction with an interactive visualization. MGE borrows ideas from separation logic and shape analysis to reveal relationships between memory nodes, name recognized structures such as doubly-linked lists and binary trees, and summarize complex structures. This summarization works for corrupt data structures, too, and is particularly powerful for large, nested structures due to its support for interactive (un)folding. MGE's utility for aiding program comprehension is illustrated by real-world and textbook examples and contrasted with existing debuggers.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130654812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Understanding the correct API usage sequences is one of the most important tasks for programmers when they work with unfamiliar libraries. However, programmers often encounter obstacles to finding the appropriate information due to either poor quality of API documentation or ineffective query-based searching strategy. To help solve this issue, researchers have proposed various methods to suggest the sequence of APIs given natural language queries representing the information needs from programmers. Among such efforts, Gu et al. adopted a deep learning method, in particular an RNN Encoder-Decoder architecture, to perform this task and obtained promising results on common APIs in Java. In this work, we aim to reproduce their results and apply the same methods for APIs in Python. Additionally, we compare the performance with a more recent Transformer-based method, i.e., CodeBERT, for the same task. Our experiment reveals a clear drop in performance measures when careful data cleaning is performed. Owing to the pretraining from a large number of source code files and effective encoding technique, CodeBERT outperforms the method by Gu et al., to a large extent.
{"title":"Deep API Learning Revisited","authors":"James Martin, Jinrong Guo","doi":"10.1145/3524610.3527872","DOIUrl":"https://doi.org/10.1145/3524610.3527872","url":null,"abstract":"Understanding the correct API usage sequences is one of the most important tasks for programmers when they work with unfamiliar libraries. However, programmers often encounter obstacles to finding the appropriate information due to either poor quality of API documentation or ineffective query-based searching strategy. To help solve this issue, researchers have proposed various methods to suggest the sequence of APIs given natural language queries representing the information needs from programmers. Among such efforts, Gu et al. adopted a deep learning method, in particular an RNN Encoder-Decoder architecture, to perform this task and obtained promising results on common APIs in Java. In this work, we aim to reproduce their results and apply the same methods for APIs in Python. Additionally, we compare the performance with a more recent Transformer-based method, i.e., CodeBERT, for the same task. Our experiment reveals a clear drop in performance measures when careful data cleaning is performed. Owing to the pretraining from a large number of source code files and effective encoding technique, CodeBERT outperforms the method by Gu et al., to a large extent.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"112 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131056066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
When changing code, developers sometimes neglect updating the related comments, bringing inconsistent or outdated comments. These comments increase the cost of program understanding and greatly reduce software maintainability. Researchers have put forward some solutions, such as CUP and HEBCUP, which update comments efficiently for simple code changes (i.e. modifying of a single token), but not good enough for complex ones. In this paper, we propose an approach named HatCUP (Hybrid Analysis and Attention based Comment UPdater), to provide a new mechanism for comment updating task. HatCUP pays attention to hybrid analysis and information. First, HatCUP considers the code structure change information and introduces a structureguided attention mechanism combined with code change graph analysis and optimistic data flow dependency analysis. With a generally popular RNN-based encoder-decoder architecture, HatCUP takes the action of the code edits, the syntax, semantics and structure code changes, and old comments as inputs and generates a structural representation of the changes in the current code snippet. Furthermore, instead of directly generating new comments, HatCUP proposes a new edit or nonedit mechanism to mimic human editing behavior, by generating a sequence of edit actions and constructing a modified RNN model to integrate newly developed components. Evaluation on a popular dataset demonstrates that HatCUP outper-forms the state-of-the-art deep learning-based approaches (CUP) by 53.8% for accuracy, 31.3% for recall and 14.3% for METEOR of the original metrics. Compared with the heuristic-based approach (HEBCUP), HatCUP also shows better overall performance.
当更改代码时,开发人员有时会忽略更新相关的注释,从而带来不一致或过时的注释。这些注释增加了理解程序的成本,并大大降低了软件的可维护性。研究人员提出了一些解决方案,如CUP和HEBCUP,对于简单的代码更改(即修改单个令牌),它们可以有效地更新注释,但对于复杂的代码更改则不够好。本文提出了一种名为HatCUP (Hybrid Analysis and Attention based Comment UPdater)的方法,为评论更新任务提供了一种新的机制。HatCUP注重混合分析和信息。首先,HatCUP考虑了代码结构变化信息,引入了结合代码变化图分析和乐观数据流依赖分析的结构导向关注机制。使用普遍流行的基于rnn的编码器-解码器架构,HatCUP将代码编辑、语法、语义和结构代码更改以及旧注释作为输入,并在当前代码片段中生成更改的结构化表示。此外,HatCUP并没有直接生成新的评论,而是提出了一种新的编辑或非编辑机制来模拟人类的编辑行为,方法是生成一系列编辑动作,并构建一个改进的RNN模型来集成新开发的组件。对一个流行数据集的评估表明,HatCUP在原始指标的准确性、召回率和流星率方面比最先进的基于深度学习的方法(CUP)高出53.8%、31.3%和14.3%。与基于启发式的方法(HEBCUP)相比,HatCUP也表现出更好的综合性能。
{"title":"HatCUP: Hybrid Analysis and Attention based Just-In-Time Comment Updating","authors":"Hongquan Zhu, Xincheng He, Lei Xu","doi":"10.1145/3524610.3527901","DOIUrl":"https://doi.org/10.1145/3524610.3527901","url":null,"abstract":"When changing code, developers sometimes neglect updating the related comments, bringing inconsistent or outdated comments. These comments increase the cost of program understanding and greatly reduce software maintainability. Researchers have put forward some solutions, such as CUP and HEBCUP, which update comments efficiently for simple code changes (i.e. modifying of a single token), but not good enough for complex ones. In this paper, we propose an approach named HatCUP (Hybrid Analysis and Attention based Comment UPdater), to provide a new mechanism for comment updating task. HatCUP pays attention to hybrid analysis and information. First, HatCUP considers the code structure change information and introduces a structureguided attention mechanism combined with code change graph analysis and optimistic data flow dependency analysis. With a generally popular RNN-based encoder-decoder architecture, HatCUP takes the action of the code edits, the syntax, semantics and structure code changes, and old comments as inputs and generates a structural representation of the changes in the current code snippet. Furthermore, instead of directly generating new comments, HatCUP proposes a new edit or nonedit mechanism to mimic human editing behavior, by generating a sequence of edit actions and constructing a modified RNN model to integrate newly developed components. Evaluation on a popular dataset demonstrates that HatCUP outper-forms the state-of-the-art deep learning-based approaches (CUP) by 53.8% for accuracy, 31.3% for recall and 14.3% for METEOR of the original metrics. Compared with the heuristic-based approach (HEBCUP), HatCUP also shows better overall performance.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131055493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Structured Query Language (SQL) is a widely taught database query language in computer science, data science, and software engineering programs. While highly expressive, SQL is challenging to learn for novices. Various research has explored the errors and mistakes that SQL users make. Specific attributes of SQL code, such as the number of tables and the degree of nesting, have been found to impact its understandability and maintainability. Furthermore, prior studies have shown that novices have significant issues using SQL correctly, due to factors such as expressive ease, existing knowledge and misconceptions, and the impact of cognitive load. In this paper we identify another factor: self-inflicted query complexity, where users hinder their own problem solving process. We analyse 8K intermediate and final student attempts to six SQL exer-cises, approaching complexity from four perspective: correctness, execution order, edit distance and query intricacy. Through our analyses, we find that our students are hindered in their query formulation process by mismanaging complexity through writing overly elaborate queries containing unnecessary elements, overusing brackets and nesting, and incrementally building queries with persistent errors.
{"title":"So many brackets! An analysis of how SQL learners (mis)manage complexity during query formulation","authors":"Daphne Miedema, G. Fletcher, Efthimia Aivaloglou","doi":"10.1145/3524610.3529158","DOIUrl":"https://doi.org/10.1145/3524610.3529158","url":null,"abstract":"The Structured Query Language (SQL) is a widely taught database query language in computer science, data science, and software engineering programs. While highly expressive, SQL is challenging to learn for novices. Various research has explored the errors and mistakes that SQL users make. Specific attributes of SQL code, such as the number of tables and the degree of nesting, have been found to impact its understandability and maintainability. Furthermore, prior studies have shown that novices have significant issues using SQL correctly, due to factors such as expressive ease, existing knowledge and misconceptions, and the impact of cognitive load. In this paper we identify another factor: self-inflicted query complexity, where users hinder their own problem solving process. We analyse 8K intermediate and final student attempts to six SQL exer-cises, approaching complexity from four perspective: correctness, execution order, edit distance and query intricacy. Through our analyses, we find that our students are hindered in their query formulation process by mismanaging complexity through writing overly elaborate queries containing unnecessary elements, overusing brackets and nesting, and incrementally building queries with persistent errors.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133007452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Asaf Etgar, Ram Friedman, Shaked Haiman, Dana Perez, D. Feitelson
Memorable function and variable names are useful for developers: they reduce the need to re-check how objects are named when one wants to use them, and they enhance comprehension when encountered when reading code. We look at the possible interplay between the information contained in names and how memorable they are. We show in two independent experiments involving a total of 190 subjects that informative names are usually easier to recollect than similar-length names which contain less focused information. Interestingly, we find that less-experienced and female participants are better at remembering the less informative names. We also find that short names, which are not just abbreviated but actually contain less information, are significantly more memorable. Hence a good choice would be to use the the shortest name that includes the most focused and pertinent information.
{"title":"The Effect of Information Content and Length on Name Recollection","authors":"Asaf Etgar, Ram Friedman, Shaked Haiman, Dana Perez, D. Feitelson","doi":"10.1145/3524610.3529159","DOIUrl":"https://doi.org/10.1145/3524610.3529159","url":null,"abstract":"Memorable function and variable names are useful for developers: they reduce the need to re-check how objects are named when one wants to use them, and they enhance comprehension when encountered when reading code. We look at the possible interplay between the information contained in names and how memorable they are. We show in two independent experiments involving a total of 190 subjects that informative names are usually easier to recollect than similar-length names which contain less focused information. Interestingly, we find that less-experienced and female participants are better at remembering the less informative names. We also find that short names, which are not just abbreviated but actually contain less information, are significantly more memorable. Hence a good choice would be to use the the shortest name that includes the most focused and pertinent information.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124639677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Zhao, Ran Mo, Yao Zhang, Siyuan Zhang, Pu Xiong
Microservice is an architecture style that decomposes complex software into loosely coupled services, which could be developed, maintained, and deployed independently. In recent years, the mi-croservice architecture has been drawing more and more attention from both industrial and academic communities. Many companies, such as Google, Netflix, Amazon, and IBM have applied microser-vice architecture in their projects. Researchers have also studied microservices in different directions, such as microservices extraction, fault localization, and code quality analysis. The recent work has presented cross-service code clones are prevalent in microser-vice projects and have caused considerable co-modifications among different services, which undermines the independence of microser-vices. But there is no systematic study to reveal the underlying reasons for the emergence of such clones. In this paper, we first build a dataset consisting of 2,722 pairs of cross-service clones from 22 open-source microservice projects. Then we manually inspect the implementations of files and methods involved in cross-service clones to understand why the clones are introduced. In the file-level analysis, we categorize files into three types: DPFile (Data-processing File), DRFile (Data-related File), and DIFile (Data-irrelevant File), and have presented that DRFiles are more likely to encounter cross-service clones. For each type of files, we further classify them into specific cases. Each case describes the characteristics of involved files and why the clones happen. In the method-level analysis, we dig information from the code of involved methods. On this basis, we propose a catalog containing 4 categories with 10 subcategories of method-level implementations that result in cross-service clones. We believe our analyses have provided the fundamental knowledge of cross-service clones, which can help developers better manage and resolve such clones in microservice projects.
{"title":"Exploring and Understanding Cross-service Code Clones in Microservice Projects","authors":"Yang Zhao, Ran Mo, Yao Zhang, Siyuan Zhang, Pu Xiong","doi":"10.1145/3524610.3527925","DOIUrl":"https://doi.org/10.1145/3524610.3527925","url":null,"abstract":"Microservice is an architecture style that decomposes complex software into loosely coupled services, which could be developed, maintained, and deployed independently. In recent years, the mi-croservice architecture has been drawing more and more attention from both industrial and academic communities. Many companies, such as Google, Netflix, Amazon, and IBM have applied microser-vice architecture in their projects. Researchers have also studied microservices in different directions, such as microservices extraction, fault localization, and code quality analysis. The recent work has presented cross-service code clones are prevalent in microser-vice projects and have caused considerable co-modifications among different services, which undermines the independence of microser-vices. But there is no systematic study to reveal the underlying reasons for the emergence of such clones. In this paper, we first build a dataset consisting of 2,722 pairs of cross-service clones from 22 open-source microservice projects. Then we manually inspect the implementations of files and methods involved in cross-service clones to understand why the clones are introduced. In the file-level analysis, we categorize files into three types: DPFile (Data-processing File), DRFile (Data-related File), and DIFile (Data-irrelevant File), and have presented that DRFiles are more likely to encounter cross-service clones. For each type of files, we further classify them into specific cases. Each case describes the characteristics of involved files and why the clones happen. In the method-level analysis, we dig information from the code of involved methods. On this basis, we propose a catalog containing 4 categories with 10 subcategories of method-level implementations that result in cross-service clones. We believe our analyses have provided the fundamental knowledge of cross-service clones, which can help developers better manage and resolve such clones in microservice projects.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133995082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The most common reason for Continuous Integration (CI) builds to break is failing tests. When a build breaks, a developer often has to scroll through hundreds to thousands of log lines to find which test is failing and why. Finding the issue is a tedious process that relies on a developer's experience and increases the cost of software testing. We investigate how presenting different kinds of contextual information about CI builds in the Integrated Development Environment (IDE) impacts the time developers take to fix a broken build. Our IntelliJ plugin TESTAXIS surfaces additional information such as a unique view of the code under test that was changed leading up to the build failure. We conduct a user experiment and show that TESTAXIS helps developers fix failing tests 13.4% to 48.6% faster. The participants found the features of TESTAXIS useful and would incorporate it in their development workflow to save time. With TESTAXIS we set an important step towards removing the need to manually inspect build logs and bringing CI build results to the IDE, ultimately saving developers time.
{"title":"Fixing Continuous Integration Tests From Within the IDE With Contextual Information","authors":"Casper Boone, C. Brandt, A. Zaidman","doi":"10.1145/3524610.3527908","DOIUrl":"https://doi.org/10.1145/3524610.3527908","url":null,"abstract":"The most common reason for Continuous Integration (CI) builds to break is failing tests. When a build breaks, a developer often has to scroll through hundreds to thousands of log lines to find which test is failing and why. Finding the issue is a tedious process that relies on a developer's experience and increases the cost of software testing. We investigate how presenting different kinds of contextual information about CI builds in the Integrated Development Environment (IDE) impacts the time developers take to fix a broken build. Our IntelliJ plugin TESTAXIS surfaces additional information such as a unique view of the code under test that was changed leading up to the build failure. We conduct a user experiment and show that TESTAXIS helps developers fix failing tests 13.4% to 48.6% faster. The participants found the features of TESTAXIS useful and would incorporate it in their development workflow to save time. With TESTAXIS we set an important step towards removing the need to manually inspect build logs and bringing CI build results to the IDE, ultimately saving developers time.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127657352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reem S. Alsuhaibani, Christian D. Newman, M. J. Decker, M. Collard, Jonathan I. Maletic
An approach is presented to automatically assess the quality of method names by providing a score and feedback. The approach implements ten method naming standards to evaluate the names. The naming standards are taken from work that validated the standards via a large survey of software professionals. Natural language processing techniques such as part-of-speech tagging, identifier splitting, and dictionary lookup are required to implement the standards. The approach is evaluated by first manually constructing a large golden set of method names. Each method name is rated by several developers and labeled as conforming to each standard or not. These ratings allow for comparing the results of the approach against expert assessment. Additionally, the approach is applied to several systems and the results are manually inspected for accuracy.
{"title":"An Approach to Automatically Assess Method Names","authors":"Reem S. Alsuhaibani, Christian D. Newman, M. J. Decker, M. Collard, Jonathan I. Maletic","doi":"10.1145/3524610.3527780","DOIUrl":"https://doi.org/10.1145/3524610.3527780","url":null,"abstract":"An approach is presented to automatically assess the quality of method names by providing a score and feedback. The approach implements ten method naming standards to evaluate the names. The naming standards are taken from work that validated the standards via a large survey of software professionals. Natural language processing techniques such as part-of-speech tagging, identifier splitting, and dictionary lookup are required to implement the standards. The approach is evaluated by first manually constructing a large golden set of method names. Each method name is rated by several developers and labeled as conforming to each standard or not. These ratings allow for comparing the results of the approach against expert assessment. Additionally, the approach is applied to several systems and the results are manually inspected for accuracy.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116727380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}