2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)最新文献

英文中文

Using Discord Conversations as Program Comprehension Aid 使用不和对话作为程序理解辅助

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-05-01 DOI: 10.1145/3524610.3528388

Marco Raglianti, Csaba Nagy, Roberto Minelli, Michele Lanza

Modern communication platforms used in software development host daily conversations among developers and users about a wide range of topics pertaining to software systems, such as language features, APIs, code artifacts like classes and methods, design patterns, usage examples, code reviews, bug reporting and fixing. Discord servers are one of these virtual community hubs that have seen a steep rise in popularity, as coordination and aggregation means for communities of developers. Although Discord supports filter-based search functionalities, the sheer volume, velocity, and small granularity of single messages make it hard to find useful results, let alone complete discussions revolving around particular themes. One reason is that the concept of a discussion, which we call a conversation, does not exist as an explicit concept. We argue that extracting and analyzing such conversations can be used fruitfully to aid program comprehension. We present an approach that reconstructs the conversations that take place on a software community Discord server, focusing on software-related conversations: Our approach binds the conversations to the discussed artifacts. Leveraging our approach, we built a tool that enables the interactive exploration of the conversations' contents. We illustrate its usefulness through a number of examples that highlight how the insights obtained serve as an additional form of software documentation and program comprehension aid.

软件开发中使用的现代通信平台承载了开发人员和用户之间关于软件系统的广泛主题的日常对话，例如语言特性、api、代码工件(如类和方法)、设计模式、使用示例、代码审查、错误报告和修复。作为开发者社区的协调和聚合手段，Discord服务器是其中一个受欢迎程度急剧上升的虚拟社区中心。尽管Discord支持基于过滤器的搜索功能，但单条消息的庞大数量、速度和小粒度使得很难找到有用的结果，更不用说围绕特定主题完成讨论了。原因之一是讨论的概念，也就是我们所说的对话，并不是一个明确的概念。我们认为提取和分析这样的对话可以有效地用于帮助程序理解。我们提出了一种重建软件社区Discord服务器上发生的对话的方法，重点关注与软件相关的对话:我们的方法将对话绑定到所讨论的工件。利用我们的方法，我们构建了一个工具，可以对对话内容进行交互式探索。我们通过一些例子来说明它的有用性，这些例子强调了获得的见解如何作为软件文档和程序理解辅助的附加形式。

{"title":"Using Discord Conversations as Program Comprehension Aid","authors":"Marco Raglianti, Csaba Nagy, Roberto Minelli, Michele Lanza","doi":"10.1145/3524610.3528388","DOIUrl":"https://doi.org/10.1145/3524610.3528388","url":null,"abstract":"Modern communication platforms used in software development host daily conversations among developers and users about a wide range of topics pertaining to software systems, such as language features, APIs, code artifacts like classes and methods, design patterns, usage examples, code reviews, bug reporting and fixing. Discord servers are one of these virtual community hubs that have seen a steep rise in popularity, as coordination and aggregation means for communities of developers. Although Discord supports filter-based search functionalities, the sheer volume, velocity, and small granularity of single messages make it hard to find useful results, let alone complete discussions revolving around particular themes. One reason is that the concept of a discussion, which we call a conversation, does not exist as an explicit concept. We argue that extracting and analyzing such conversations can be used fruitfully to aid program comprehension. We present an approach that reconstructs the conversations that take place on a software community Discord server, focusing on software-related conversations: Our approach binds the conversations to the discussed artifacts. Leveraging our approach, we built a tool that enables the interactive exploration of the conversations' contents. We illustrate its usefulness through a number of examples that highlight how the insights obtained serve as an additional form of software documentation and program comprehension aid.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127450971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Simple or Complex? Together for a More Accurate Just-In-Time Defect Predictor 简单还是复杂?一起获得更准确的及时缺陷预测器

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-05-01 DOI: 10.1145/3524610.3527910

Xin Zhou, Donggyun Han, David Lo

Just-In-Time (JIT) defect prediction aims to automatically predict whether a commit is defective or not, and has been widely studied in recent years. In general, most studies can be classified into two categories: 1) simple models using traditional machine learning classifiers with hand-crafted features, and 2) complex models using deep learning techniques to automatically extract features. Hand-crafted features used by simple models are based on expert knowledge but may not fully represent the semantic meaning of the commits. On the other hand, deep learning-based features used by complex models represent the semantic meaning of commits but may not reflect useful expert knowledge. Simple models and complex models seem complementary to each other to some extent. To utilize the advantages of both simple and complex models, we propose a combined model namely SimCom by fusing the prediction scores of one simple and one complex model. The experimental results show that our approach can significantly outperform the state-of-the-art by 6.0-18.1%. In addition, our experimental results confirm that the simple model and complex model are complementary to each other.

即时缺陷预测(JIT)旨在自动预测提交是否存在缺陷，近年来得到了广泛的研究。一般来说，大多数研究可以分为两类:1)使用传统机器学习分类器和手工制作特征的简单模型，以及2)使用深度学习技术自动提取特征的复杂模型。简单模型使用的手工特征基于专家知识，但可能不能完全表示提交的语义含义。另一方面，复杂模型使用的基于深度学习的特征表示提交的语义含义，但可能无法反映有用的专家知识。简单模型和复杂模型似乎在某种程度上是互补的。为了利用简单模型和复杂模型的优点，我们提出了一种将一个简单模型和一个复杂模型的预测分数融合的组合模型SimCom。实验结果表明，我们的方法可以显著优于目前最先进的6.0-18.1%。此外，我们的实验结果证实了简单模型和复杂模型是互补的。

{"title":"Simple or Complex? Together for a More Accurate Just-In-Time Defect Predictor","authors":"Xin Zhou, Donggyun Han, David Lo","doi":"10.1145/3524610.3527910","DOIUrl":"https://doi.org/10.1145/3524610.3527910","url":null,"abstract":"Just-In-Time (JIT) defect prediction aims to automatically predict whether a commit is defective or not, and has been widely studied in recent years. In general, most studies can be classified into two categories: 1) simple models using traditional machine learning classifiers with hand-crafted features, and 2) complex models using deep learning techniques to automatically extract features. Hand-crafted features used by simple models are based on expert knowledge but may not fully represent the semantic meaning of the commits. On the other hand, deep learning-based features used by complex models represent the semantic meaning of commits but may not reflect useful expert knowledge. Simple models and complex models seem complementary to each other to some extent. To utilize the advantages of both simple and complex models, we propose a combined model namely SimCom by fusing the prediction scores of one simple and one complex model. The experimental results show that our approach can significantly outperform the state-of-the-art by 6.0-18.1%. In addition, our experimental results confirm that the simple model and complex model are complementary to each other.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128928513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Shape-Analysis Driven Memory Graph Visualization 形状分析驱动的内存图形可视化

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-05-01 DOI: 10.1145/3524610.3527913

Jan H. Boockmann, Gerald Lüttgen

Analyzing heap dumps containing complex dynamic data structures is essential when debugging modern software systems. However, existing tools for visualizing memory graphs can neither deal with corrupt structures such as binary trees exhibiting cycles, nor do they offer adequate abstractions when being confronted with large heaps. This paper presents MGE (Memory Graph Explorer), a memory analyzer and visualizer that combines a novel memory graph abstraction with an interactive visualization. MGE borrows ideas from separation logic and shape analysis to reveal relationships between memory nodes, name recognized structures such as doubly-linked lists and binary trees, and summarize complex structures. This summarization works for corrupt data structures, too, and is particularly powerful for large, nested structures due to its support for interactive (un)folding. MGE's utility for aiding program comprehension is illustrated by real-world and textbook examples and contrasted with existing debuggers.

在调试现代软件系统时，分析包含复杂动态数据结构的堆转储是必不可少的。然而，用于可视化内存图的现有工具既不能处理显示循环的二叉树等损坏的结构，也不能在面对大堆时提供足够的抽象。MGE (Memory Graph Explorer)是一种内存分析和可视化工具，它结合了一种新颖的内存图抽象和交互式可视化。MGE借用分离逻辑和形状分析的思想来揭示存储节点之间的关系，命名可识别的结构，如双链表和二叉树，并总结复杂的结构。此摘要也适用于损坏的数据结构，并且由于支持交互式(非)折叠，因此对于大型嵌套结构尤其强大。MGE用于帮助程序理解的实用程序通过现实世界和教科书中的例子进行了说明，并与现有的调试器进行了对比。

引用次数: 1

Deep API Learning Revisited 深度API学习重访

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-05-01 DOI: 10.1145/3524610.3527872

James Martin, Jinrong Guo

Understanding the correct API usage sequences is one of the most important tasks for programmers when they work with unfamiliar libraries. However, programmers often encounter obstacles to finding the appropriate information due to either poor quality of API documentation or ineffective query-based searching strategy. To help solve this issue, researchers have proposed various methods to suggest the sequence of APIs given natural language queries representing the information needs from programmers. Among such efforts, Gu et al. adopted a deep learning method, in particular an RNN Encoder-Decoder architecture, to perform this task and obtained promising results on common APIs in Java. In this work, we aim to reproduce their results and apply the same methods for APIs in Python. Additionally, we compare the performance with a more recent Transformer-based method, i.e., CodeBERT, for the same task. Our experiment reveals a clear drop in performance measures when careful data cleaning is performed. Owing to the pretraining from a large number of source code files and effective encoding technique, CodeBERT outperforms the method by Gu et al., to a large extent.

当程序员使用不熟悉的库时，理解正确的API使用顺序是最重要的任务之一。但是，由于API文档质量差或基于查询的搜索策略无效，程序员在查找适当信息时经常遇到障碍。为了帮助解决这个问题，研究人员提出了各种方法来建议api的顺序，给出了代表程序员信息需求的自然语言查询。在这些努力中，Gu等人采用了深度学习方法，特别是RNN编码器-解码器架构来执行该任务，并在Java中的常见api上获得了令人满意的结果。在这项工作中，我们的目标是重现他们的结果，并将相同的方法应用于Python中的api。此外，对于相同的任务，我们将性能与最近的基于transformer的方法(即CodeBERT)进行比较。我们的实验显示，在执行仔细的数据清理时，性能指标明显下降。CodeBERT基于大量源代码文件的预训练和有效的编码技术，在很大程度上优于Gu等人的方法。

引用次数: 4

HatCUP: Hybrid Analysis and Attention based Just-In-Time Comment Updating HatCUP:基于即时评论更新的混合分析和关注

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-05-01 DOI: 10.1145/3524610.3527901

Hongquan Zhu, Xincheng He, Lei Xu

When changing code, developers sometimes neglect updating the related comments, bringing inconsistent or outdated comments. These comments increase the cost of program understanding and greatly reduce software maintainability. Researchers have put forward some solutions, such as CUP and HEBCUP, which update comments efficiently for simple code changes (i.e. modifying of a single token), but not good enough for complex ones. In this paper, we propose an approach named HatCUP (Hybrid Analysis and Attention based Comment UPdater), to provide a new mechanism for comment updating task. HatCUP pays attention to hybrid analysis and information. First, HatCUP considers the code structure change information and introduces a structureguided attention mechanism combined with code change graph analysis and optimistic data flow dependency analysis. With a generally popular RNN-based encoder-decoder architecture, HatCUP takes the action of the code edits, the syntax, semantics and structure code changes, and old comments as inputs and generates a structural representation of the changes in the current code snippet. Furthermore, instead of directly generating new comments, HatCUP proposes a new edit or nonedit mechanism to mimic human editing behavior, by generating a sequence of edit actions and constructing a modified RNN model to integrate newly developed components. Evaluation on a popular dataset demonstrates that HatCUP outper-forms the state-of-the-art deep learning-based approaches (CUP) by 53.8% for accuracy, 31.3% for recall and 14.3% for METEOR of the original metrics. Compared with the heuristic-based approach (HEBCUP), HatCUP also shows better overall performance.

当更改代码时，开发人员有时会忽略更新相关的注释，从而带来不一致或过时的注释。这些注释增加了理解程序的成本，并大大降低了软件的可维护性。研究人员提出了一些解决方案，如CUP和HEBCUP，对于简单的代码更改(即修改单个令牌)，它们可以有效地更新注释，但对于复杂的代码更改则不够好。本文提出了一种名为HatCUP (Hybrid Analysis and Attention based Comment UPdater)的方法，为评论更新任务提供了一种新的机制。HatCUP注重混合分析和信息。首先，HatCUP考虑了代码结构变化信息，引入了结合代码变化图分析和乐观数据流依赖分析的结构导向关注机制。使用普遍流行的基于rnn的编码器-解码器架构，HatCUP将代码编辑、语法、语义和结构代码更改以及旧注释作为输入，并在当前代码片段中生成更改的结构化表示。此外，HatCUP并没有直接生成新的评论，而是提出了一种新的编辑或非编辑机制来模拟人类的编辑行为，方法是生成一系列编辑动作，并构建一个改进的RNN模型来集成新开发的组件。对一个流行数据集的评估表明，HatCUP在原始指标的准确性、召回率和流星率方面比最先进的基于深度学习的方法(CUP)高出53.8%、31.3%和14.3%。与基于启发式的方法(HEBCUP)相比，HatCUP也表现出更好的综合性能。

{"title":"HatCUP: Hybrid Analysis and Attention based Just-In-Time Comment Updating","authors":"Hongquan Zhu, Xincheng He, Lei Xu","doi":"10.1145/3524610.3527901","DOIUrl":"https://doi.org/10.1145/3524610.3527901","url":null,"abstract":"When changing code, developers sometimes neglect updating the related comments, bringing inconsistent or outdated comments. These comments increase the cost of program understanding and greatly reduce software maintainability. Researchers have put forward some solutions, such as CUP and HEBCUP, which update comments efficiently for simple code changes (i.e. modifying of a single token), but not good enough for complex ones. In this paper, we propose an approach named HatCUP (Hybrid Analysis and Attention based Comment UPdater), to provide a new mechanism for comment updating task. HatCUP pays attention to hybrid analysis and information. First, HatCUP considers the code structure change information and introduces a structureguided attention mechanism combined with code change graph analysis and optimistic data flow dependency analysis. With a generally popular RNN-based encoder-decoder architecture, HatCUP takes the action of the code edits, the syntax, semantics and structure code changes, and old comments as inputs and generates a structural representation of the changes in the current code snippet. Furthermore, instead of directly generating new comments, HatCUP proposes a new edit or nonedit mechanism to mimic human editing behavior, by generating a sequence of edit actions and constructing a modified RNN model to integrate newly developed components. Evaluation on a popular dataset demonstrates that HatCUP outper-forms the state-of-the-art deep learning-based approaches (CUP) by 53.8% for accuracy, 31.3% for recall and 14.3% for METEOR of the original metrics. Compared with the heuristic-based approach (HEBCUP), HatCUP also shows better overall performance.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131055493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

So many brackets! An analysis of how SQL learners (mis)manage complexity during query formulation 这么多括号!分析了SQL学习器(mis)在查询制定过程中如何管理复杂性

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-05-01 DOI: 10.1145/3524610.3529158

Daphne Miedema, G. Fletcher, Efthimia Aivaloglou

The Structured Query Language (SQL) is a widely taught database query language in computer science, data science, and software engineering programs. While highly expressive, SQL is challenging to learn for novices. Various research has explored the errors and mistakes that SQL users make. Specific attributes of SQL code, such as the number of tables and the degree of nesting, have been found to impact its understandability and maintainability. Furthermore, prior studies have shown that novices have significant issues using SQL correctly, due to factors such as expressive ease, existing knowledge and misconceptions, and the impact of cognitive load. In this paper we identify another factor: self-inflicted query complexity, where users hinder their own problem solving process. We analyse 8K intermediate and final student attempts to six SQL exer-cises, approaching complexity from four perspective: correctness, execution order, edit distance and query intricacy. Through our analyses, we find that our students are hindered in their query formulation process by mismanaging complexity through writing overly elaborate queries containing unnecessary elements, overusing brackets and nesting, and incrementally building queries with persistent errors.

结构化查询语言(SQL)是计算机科学、数据科学和软件工程程序中广泛使用的数据库查询语言。虽然SQL具有很强的表现力，但对于初学者来说，学习它是具有挑战性的。各种各样的研究探讨了SQL用户所犯的错误和错误。SQL代码的特定属性，如表的数量和嵌套的程度，会影响其可理解性和可维护性。此外，先前的研究表明，由于表达容易、现有知识和误解以及认知负荷的影响等因素，新手在正确使用SQL方面存在重大问题。在本文中，我们确定了另一个因素:用户自己造成的查询复杂性，这阻碍了他们自己解决问题的过程。我们从正确性、执行顺序、编辑距离和查询复杂性四个方面分析了学生对六道SQL习题的8K次中级和期末尝试。通过我们的分析，我们发现我们的学生在他们的查询制定过程中，由于编写包含不必要元素的过于复杂的查询，过度使用括号和嵌套，以及增量地构建带有持久错误的查询，从而对复杂性管理不当而受到阻碍。

{"title":"So many brackets! An analysis of how SQL learners (mis)manage complexity during query formulation","authors":"Daphne Miedema, G. Fletcher, Efthimia Aivaloglou","doi":"10.1145/3524610.3529158","DOIUrl":"https://doi.org/10.1145/3524610.3529158","url":null,"abstract":"The Structured Query Language (SQL) is a widely taught database query language in computer science, data science, and software engineering programs. While highly expressive, SQL is challenging to learn for novices. Various research has explored the errors and mistakes that SQL users make. Specific attributes of SQL code, such as the number of tables and the degree of nesting, have been found to impact its understandability and maintainability. Furthermore, prior studies have shown that novices have significant issues using SQL correctly, due to factors such as expressive ease, existing knowledge and misconceptions, and the impact of cognitive load. In this paper we identify another factor: self-inflicted query complexity, where users hinder their own problem solving process. We analyse 8K intermediate and final student attempts to six SQL exer-cises, approaching complexity from four perspective: correctness, execution order, edit distance and query intricacy. Through our analyses, we find that our students are hindered in their query formulation process by mismanaging complexity through writing overly elaborate queries containing unnecessary elements, overusing brackets and nesting, and incrementally building queries with persistent errors.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133007452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

The Effect of Information Content and Length on Name Recollection 信息内容和信息长度对人名记忆的影响

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-05-01 DOI: 10.1145/3524610.3529159

Asaf Etgar, Ram Friedman, Shaked Haiman, Dana Perez, D. Feitelson

Memorable function and variable names are useful for developers: they reduce the need to re-check how objects are named when one wants to use them, and they enhance comprehension when encountered when reading code. We look at the possible interplay between the information contained in names and how memorable they are. We show in two independent experiments involving a total of 190 subjects that informative names are usually easier to recollect than similar-length names which contain less focused information. Interestingly, we find that less-experienced and female participants are better at remembering the less informative names. We also find that short names, which are not just abbreviated but actually contain less information, are significantly more memorable. Hence a good choice would be to use the the shortest name that includes the most focused and pertinent information.

容易记住的函数和变量名对开发人员很有用:它们减少了在需要使用对象时重新检查对象命名方式的需要，并且在阅读代码时，它们增强了理解能力。我们研究名字中包含的信息和记忆程度之间可能的相互作用。我们在两个独立的实验中，共涉及190名受试者，结果表明，信息量大的名字通常比信息量少的长度相近的名字更容易回忆。有趣的是，我们发现经验不足的参与者和女性参与者更善于记住信息较少的名字。我们还发现，短的名字，不仅是缩写，而且实际上包含的信息更少，更容易记住。因此，一个好的选择是使用包含最集中和最相关信息的最短名称。

引用次数: 4

Exploring and Understanding Cross-service Code Clones in Microservice Projects 探索和理解微服务项目中的跨服务代码克隆

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-05-01 DOI: 10.1145/3524610.3527925

Yang Zhao, Ran Mo, Yao Zhang, Siyuan Zhang, Pu Xiong

Microservice is an architecture style that decomposes complex software into loosely coupled services, which could be developed, maintained, and deployed independently. In recent years, the mi-croservice architecture has been drawing more and more attention from both industrial and academic communities. Many companies, such as Google, Netflix, Amazon, and IBM have applied microser-vice architecture in their projects. Researchers have also studied microservices in different directions, such as microservices extraction, fault localization, and code quality analysis. The recent work has presented cross-service code clones are prevalent in microser-vice projects and have caused considerable co-modifications among different services, which undermines the independence of microser-vices. But there is no systematic study to reveal the underlying reasons for the emergence of such clones. In this paper, we first build a dataset consisting of 2,722 pairs of cross-service clones from 22 open-source microservice projects. Then we manually inspect the implementations of files and methods involved in cross-service clones to understand why the clones are introduced. In the file-level analysis, we categorize files into three types: DPFile (Data-processing File), DRFile (Data-related File), and DIFile (Data-irrelevant File), and have presented that DRFiles are more likely to encounter cross-service clones. For each type of files, we further classify them into specific cases. Each case describes the characteristics of involved files and why the clones happen. In the method-level analysis, we dig information from the code of involved methods. On this basis, we propose a catalog containing 4 categories with 10 subcategories of method-level implementations that result in cross-service clones. We believe our analyses have provided the fundamental knowledge of cross-service clones, which can help developers better manage and resolve such clones in microservice projects.

微服务是一种架构风格，它将复杂的软件分解为松散耦合的服务，这些服务可以独立开发、维护和部署。近年来，微交叉服务架构越来越受到业界和学术界的关注。许多公司，如Google、Netflix、Amazon和IBM，已经在他们的项目中应用了微服务体系结构。研究人员还从不同的方向对微服务进行了研究，如微服务提取、故障定位、代码质量分析等。最近的研究表明，跨服务代码克隆在微服务项目中非常普遍，并且导致了不同服务之间的大量协同修改，这破坏了微服务的独立性。但是，目前还没有系统的研究来揭示这种克隆出现的潜在原因。在本文中，我们首先构建了一个由来自22个开源微服务项目的2,722对跨服务克隆组成的数据集。然后，我们手动检查跨服务克隆中涉及的文件和方法的实现，以了解引入克隆的原因。在文件级分析中，我们将文件分为三种类型:DPFile(数据处理文件)、DRFile(数据相关文件)和DIFile(数据无关文件)，并指出DRFile更容易遇到跨服务克隆。对于每种类型的文件，我们进一步将其分类为具体的案例。每种情况都描述了所涉及文件的特征以及发生克隆的原因。在方法级分析中，我们从相关方法的代码中挖掘信息。在此基础上，我们提出了一个包含4个类别和10个子类别的方法级实现的目录，这些实现会导致跨服务克隆。我们相信我们的分析提供了跨服务克隆的基本知识，这可以帮助开发人员更好地管理和解决微服务项目中的这种克隆。

{"title":"Exploring and Understanding Cross-service Code Clones in Microservice Projects","authors":"Yang Zhao, Ran Mo, Yao Zhang, Siyuan Zhang, Pu Xiong","doi":"10.1145/3524610.3527925","DOIUrl":"https://doi.org/10.1145/3524610.3527925","url":null,"abstract":"Microservice is an architecture style that decomposes complex software into loosely coupled services, which could be developed, maintained, and deployed independently. In recent years, the mi-croservice architecture has been drawing more and more attention from both industrial and academic communities. Many companies, such as Google, Netflix, Amazon, and IBM have applied microser-vice architecture in their projects. Researchers have also studied microservices in different directions, such as microservices extraction, fault localization, and code quality analysis. The recent work has presented cross-service code clones are prevalent in microser-vice projects and have caused considerable co-modifications among different services, which undermines the independence of microser-vices. But there is no systematic study to reveal the underlying reasons for the emergence of such clones. In this paper, we first build a dataset consisting of 2,722 pairs of cross-service clones from 22 open-source microservice projects. Then we manually inspect the implementations of files and methods involved in cross-service clones to understand why the clones are introduced. In the file-level analysis, we categorize files into three types: DPFile (Data-processing File), DRFile (Data-related File), and DIFile (Data-irrelevant File), and have presented that DRFiles are more likely to encounter cross-service clones. For each type of files, we further classify them into specific cases. Each case describes the characteristics of involved files and why the clones happen. In the method-level analysis, we dig information from the code of involved methods. On this basis, we propose a catalog containing 4 categories with 10 subcategories of method-level implementations that result in cross-service clones. We believe our analyses have provided the fundamental knowledge of cross-service clones, which can help developers better manage and resolve such clones in microservice projects.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133995082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Fixing Continuous Integration Tests From Within the IDE With Contextual Information 使用上下文信息从IDE内部修复持续集成测试

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-05-01 DOI: 10.1145/3524610.3527908

Casper Boone, C. Brandt, A. Zaidman

The most common reason for Continuous Integration (CI) builds to break is failing tests. When a build breaks, a developer often has to scroll through hundreds to thousands of log lines to find which test is failing and why. Finding the issue is a tedious process that relies on a developer's experience and increases the cost of software testing. We investigate how presenting different kinds of contextual information about CI builds in the Integrated Development Environment (IDE) impacts the time developers take to fix a broken build. Our IntelliJ plugin TESTAXIS surfaces additional information such as a unique view of the code under test that was changed leading up to the build failure. We conduct a user experiment and show that TESTAXIS helps developers fix failing tests 13.4% to 48.6% faster. The participants found the features of TESTAXIS useful and would incorporate it in their development workflow to save time. With TESTAXIS we set an important step towards removing the need to manually inspect build logs and bringing CI build results to the IDE, ultimately saving developers time.

持续集成(CI)构建失败的最常见原因是测试失败。当构建中断时，开发人员通常必须滚动数百到数千个日志行来查找哪个测试失败以及失败的原因。发现问题是一个繁琐的过程，它依赖于开发人员的经验，并增加了软件测试的成本。我们研究了在集成开发环境(IDE)中呈现关于CI构建的不同类型的上下文信息如何影响开发人员修复损坏的构建所花费的时间。我们的IntelliJ插件TESTAXIS显示了额外的信息，比如被更改导致构建失败的被测代码的唯一视图。我们进行了一个用户实验，结果表明TESTAXIS帮助开发人员修复失败测试的速度提高了13.4%到48.6%。参与者发现TESTAXIS的特性很有用，并将其纳入他们的开发工作流程中以节省时间。通过TESTAXIS，我们朝着消除手动检查构建日志的需要迈出了重要的一步，并将CI构建结果引入IDE，最终节省了开发人员的时间。

引用次数: 1

An Approach to Automatically Assess Method Names 一种自动评估方法名称的方法

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-05-01 DOI: 10.1145/3524610.3527780

Reem S. Alsuhaibani, Christian D. Newman, M. J. Decker, M. Collard, Jonathan I. Maletic

An approach is presented to automatically assess the quality of method names by providing a score and feedback. The approach implements ten method naming standards to evaluate the names. The naming standards are taken from work that validated the standards via a large survey of software professionals. Natural language processing techniques such as part-of-speech tagging, identifier splitting, and dictionary lookup are required to implement the standards. The approach is evaluated by first manually constructing a large golden set of method names. Each method name is rated by several developers and labeled as conforming to each standard or not. These ratings allow for comparing the results of the approach against expert assessment. Additionally, the approach is applied to several systems and the results are manually inspected for accuracy.

提出了一种通过提供分数和反馈来自动评估方法名质量的方法。该方法实现了十个方法命名标准来评估名称。命名标准来自于通过对软件专业人员的大规模调查来验证标准的工作。实现这些标准需要词性标注、标识符分割和字典查找等自然语言处理技术。该方法首先通过手动构造一个大的黄金方法名称集来评估。每个方法名称由几个开发人员评定，并标记为符合或不符合每个标准。这些评级允许将方法的结果与专家评估进行比较。此外，该方法应用于几个系统，并对结果进行了人工检查以确保准确性。

引用次数: 1

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀