2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)最新文献_第5页

A First Look at Duplicate and Near-duplicate Self-admitted Technical Debt Comments 首先看看重复和近乎重复的自我承认的技术债务评论

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-03-30 DOI: 10.1145/3524610.3528387

Jerin Yasmin, M. Sheikhaei, Yuan Tian

Self-admitted technical debt (SATD) refers to technical debt that is intentionally introduced by developers and explicitly documented in code comments or other software artifacts (e.g., issue reports) to annotate sub-optimal decisions made by developers in the software development process. In this work, we take the first look at the existence and char-acteristics of duplicate and near-duplicate SATD comments in five popular Apache OSS projects, i.e., JSPWiki, Helix, Jackrab-bit, Archiva, and SystemML. We design a method to automatically identify groups of duplicate and near-duplicate SATD comments and track their evolution in the software system by mining the com-mit history of a software project. Leveraging the proposed method, we identified 3,520 duplicate and near-duplicate SATD comments from the target projects, which belong to 1,141 groups. We man-ually analyze the content and context of a sample of 1,505 SATD comments (by sampling 100 groups for each project) and identify if they annotate the same root cause. We also investigate whether du-plicate SATD comments exist in code clones, whether they co-exist in the same file, and whether they are introduced and removed simultaneously. Our preliminary study reveals several surprising findings that would shed light on future studies aiming to improve the management of duplicate SATD comments. For instance, only 48.5% duplicate SATD comment groups with the same root cause exist in regular code clones, and only 33.9% of the duplicate SATD comment pairs are introduced in the same commit.

自我承认的技术债务(SATD)指的是由开发人员有意引入并明确记录在代码注释或其他软件工件(例如，问题报告)中的技术债务，用于注释开发人员在软件开发过程中做出的次优决策。在本文中，我们首先了解了五个流行的Apache OSS项目(即JSPWiki、Helix、jackrabbit -bit、Archiva和SystemML)中重复和近似重复的SATD注释的存在及其特征。我们设计了一种方法来自动识别重复和接近重复的SATD注释组，并通过挖掘软件项目的提交历史来跟踪它们在软件系统中的演变。利用建议的方法，我们从目标项目中确定了3,520个重复和接近重复的SATD评论，它们属于1,141个组。我们手动分析1505个SATD评论样本的内容和上下文(通过为每个项目抽样100组)，并确定它们是否注释了相同的根本原因。我们还研究了代码克隆中是否存在重复的SATD注释，它们是否共存于同一文件中，以及它们是否同时被引入和删除。我们的初步研究揭示了几个令人惊讶的发现，这些发现将为旨在改善重复SATD评论管理的未来研究提供启示。例如，在常规代码克隆中，只有48.5%的具有相同根源的重复SATD注释组存在，并且在同一提交中只引入了33.9%的重复SATD注释对。

{"title":"A First Look at Duplicate and Near-duplicate Self-admitted Technical Debt Comments","authors":"Jerin Yasmin, M. Sheikhaei, Yuan Tian","doi":"10.1145/3524610.3528387","DOIUrl":"https://doi.org/10.1145/3524610.3528387","url":null,"abstract":"Self-admitted technical debt (SATD) refers to technical debt that is intentionally introduced by developers and explicitly documented in code comments or other software artifacts (e.g., issue reports) to annotate sub-optimal decisions made by developers in the software development process. In this work, we take the first look at the existence and char-acteristics of duplicate and near-duplicate SATD comments in five popular Apache OSS projects, i.e., JSPWiki, Helix, Jackrab-bit, Archiva, and SystemML. We design a method to automatically identify groups of duplicate and near-duplicate SATD comments and track their evolution in the software system by mining the com-mit history of a software project. Leveraging the proposed method, we identified 3,520 duplicate and near-duplicate SATD comments from the target projects, which belong to 1,141 groups. We man-ually analyze the content and context of a sample of 1,505 SATD comments (by sampling 100 groups for each project) and identify if they annotate the same root cause. We also investigate whether du-plicate SATD comments exist in code clones, whether they co-exist in the same file, and whether they are introduced and removed simultaneously. Our preliminary study reveals several surprising findings that would shed light on future studies aiming to improve the management of duplicate SATD comments. For instance, only 48.5% duplicate SATD comment groups with the same root cause exist in regular code clones, and only 33.9% of the duplicate SATD comment pairs are introduced in the same commit.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114188243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Error Identification Strategies for Python Jupyter Notebooks Python Jupyter笔记本的错误识别策略

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-03-30 DOI: 10.1145/3524610.3529156

Derek Robinson, Neil A. Ernst, Enrique Larios Vargas, M. Storey

Computational notebooks-such as Jupyter or Colab-combine text and data analysis code. They have become ubiquitous in the world of data science and exploratory data analysis. Since these notebooks present a different programming paradigm than conventional IDE-driven programming, it is plausible that debugging in computational notebooks might also be different. More specifically, since creating notebooks blends domain knowledge, statistical analysis, and programming, the ways in which notebook users find and fix errors in these different forms might be different. In this paper, we present an exploratory, observational study on how Python Jupyter notebook users find and understand potential errors in notebooks. Through a conceptual replication of study design investigating the error identification strategies of R notebook users, we presented users with Python Jupyter notebooks pre-populated with common notebook errors-errors rooted in either the statistical data analysis, the knowledge of domain concepts, or in the programming. We then analyzed the strategies our study participants used to find these errors and determined how successful each strategy was at identifying errors. Our findings indicate that while the notebook programming environment is different from the environments used for traditional programming, debugging strategies remain quite similar. It is our hope that the insights presented in this paper will help both notebook tool designers and educators make changes to improve how data scientists discover errors more easily in the notebooks they write.

计算型笔记本——比如Jupyter或colab——结合了文本和数据分析代码。它们在数据科学和探索性数据分析领域已经无处不在。由于这些笔记本提供了与传统的ide驱动编程不同的编程范例，因此在计算型笔记本中进行调试也可能是不同的。更具体地说，由于创建笔记本混合了领域知识、统计分析和编程，因此笔记本用户发现和修复这些不同形式错误的方式可能不同。在本文中，我们对Python Jupyter笔记本用户如何发现和理解笔记本中的潜在错误进行了探索性观察研究。通过对研究设计的概念复制，我们调查了R笔记本用户的错误识别策略，我们向用户提供了预先填充了常见笔记本错误的Python Jupyter笔记本，这些错误源于统计数据分析、领域概念知识或编程。然后，我们分析了研究参与者用来发现这些错误的策略，并确定每种策略在识别错误方面的成功程度。我们的研究结果表明，虽然笔记本编程环境与传统编程环境不同，但调试策略仍然非常相似。我们希望本文中提出的见解能够帮助笔记本工具的设计者和教育者做出改变，以改进数据科学家在他们写的笔记本中更容易发现错误的方式。

{"title":"Error Identification Strategies for Python Jupyter Notebooks","authors":"Derek Robinson, Neil A. Ernst, Enrique Larios Vargas, M. Storey","doi":"10.1145/3524610.3529156","DOIUrl":"https://doi.org/10.1145/3524610.3529156","url":null,"abstract":"Computational notebooks-such as Jupyter or Colab-combine text and data analysis code. They have become ubiquitous in the world of data science and exploratory data analysis. Since these notebooks present a different programming paradigm than conventional IDE-driven programming, it is plausible that debugging in computational notebooks might also be different. More specifically, since creating notebooks blends domain knowledge, statistical analysis, and programming, the ways in which notebook users find and fix errors in these different forms might be different. In this paper, we present an exploratory, observational study on how Python Jupyter notebook users find and understand potential errors in notebooks. Through a conceptual replication of study design investigating the error identification strategies of R notebook users, we presented users with Python Jupyter notebooks pre-populated with common notebook errors-errors rooted in either the statistical data analysis, the knowledge of domain concepts, or in the programming. We then analyzed the strategies our study participants used to find these errors and determined how successful each strategy was at identifying errors. Our findings indicate that while the notebook programming environment is different from the environments used for traditional programming, debugging strategies remain quite similar. It is our hope that the insights presented in this paper will help both notebook tool designers and educators make changes to improve how data scientists discover errors more easily in the notebooks they write.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133312494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Demystifying Software Release Note Issues on GitHub 揭秘GitHub上的软件发布说明问题

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-03-29 DOI: 10.1145/3524610.3527919

Jianyu Wu, Hao He, Wenxin Xiao, Kai Gao, Minghui Zhou

Release notes (RNs) summarize main changes between two consecutive software versions and serve as a central source of information when users upgrade software. While producing high quality RNs can be hard and poses a variety of challenges to developers, a comprehensive empirical understanding of these challenges is still lacking. In this paper, we bridge this knowledge gap by manually analyzing 1,731 latest GitHub issues to build a comprehensive taxonomy of RN issues with four dimensions: Content, Presentation, Accessibility, and Production. Among these issues, nearly half (48.47%) of them focus on Production; Content, Accessibility, and Presentation take 25.61 %, 17.65%, and 8.27%, respectively. We find that: 1) RN producers are more likely to miss information than to include incorrect information, especially for breaking changes; 2) improper layout may bury important information and confuse users; 3) many users find RNs inaccessible due to link deterioration, lack of notification, and obfuscate RN locations; 4) automating and regulating RN production remains challenging despite the great needs of RN producers. Our taxonomy not only pictures a roadmap to improve RN production in practice but also reveals interesting future research directions for automating RN production.

发行说明(RNs)总结了两个连续软件版本之间的主要变化，并作为用户升级软件时的中心信息来源。虽然制作高质量的rn很困难，并且给开发者带来了各种挑战，但我们仍然缺乏对这些挑战的全面经验理解。在本文中，我们通过手工分析1731个最新的GitHub问题来弥补这一知识差距，以构建一个包含四个维度的RN问题的综合分类:内容、表示、可访问性和生产。在这些问题中，近一半(48.47%)的问题集中在生产上;内容、可访问性和表示分别占25.61%、17.65%和8.27%。我们发现:1)RN生产者更容易遗漏信息，而不是包含不正确的信息，特别是对于突破性变化;2)布局不当可能会掩盖重要信息，混淆用户;3)由于链接恶化、缺乏通知和混淆RN位置，许多用户发现RN无法访问;4)尽管RN生产商有很大的需求，但自动化和调节RN生产仍然具有挑战性。我们的分类法不仅描绘了在实践中改进RN生产的路线图，而且揭示了自动化RN生产的有趣的未来研究方向。

{"title":"Demystifying Software Release Note Issues on GitHub","authors":"Jianyu Wu, Hao He, Wenxin Xiao, Kai Gao, Minghui Zhou","doi":"10.1145/3524610.3527919","DOIUrl":"https://doi.org/10.1145/3524610.3527919","url":null,"abstract":"Release notes (RNs) summarize main changes between two consecutive software versions and serve as a central source of information when users upgrade software. While producing high quality RNs can be hard and poses a variety of challenges to developers, a comprehensive empirical understanding of these challenges is still lacking. In this paper, we bridge this knowledge gap by manually analyzing 1,731 latest GitHub issues to build a comprehensive taxonomy of RN issues with four dimensions: Content, Presentation, Accessibility, and Production. Among these issues, nearly half (48.47%) of them focus on Production; Content, Accessibility, and Presentation take 25.61 %, 17.65%, and 8.27%, respectively. We find that: 1) RN producers are more likely to miss information than to include incorrect information, especially for breaking changes; 2) improper layout may bury important information and confuse users; 3) many users find RNs inaccessible due to link deterioration, lack of notification, and obfuscate RN locations; 4) automating and regulating RN production remains challenging despite the great needs of RN producers. Our taxonomy not only pictures a roadmap to improve RN production in practice but also reveals interesting future research directions for automating RN production.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127916708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Understanding Code Snippets in Code Reviews: A Preliminary Study of the OpenStack Community 理解代码审查中的代码片段:对OpenStack社区的初步研究

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-03-29 DOI: 10.1145/3524610.3527884

Liming Fu, Peng Liang, Beiqi Zhang

Code review is a mature practice for software quality assurance in software development with which reviewers check the code that has been committed by developers, and verify the quality of code. During the code review discussions, reviewers and developers might use code snippets to provide necessary information (e.g., suggestions or explanations). However, little is known about the intentions and impacts of code snippets in code reviews. To this end, we conducted a preliminary study to investigate the nature of code snippets and their purposes in code reviews. We manually collected and checked 10,790 review comments from the Nova and Neutron projects of the OpenStack community, and finally obtained 626 review comments that contain code snippets for further analysis. The results show that: (1) code snippets are not prevalently used in code reviews, and most of the code snippets are provided by reviewers. (2) We identified two high-level purposes of code snippets provided by reviewers (i.e., Suggestion and Citation) with six detailed purposes, among which, Improving Code Implementation is the most common purpose. (3) For the code snippets in code reviews with the aim of suggestion, around 68.1% was accepted by developers. The results highlight promising research directions on using code snippets in code reviews.

在软件开发中，代码审查是一种成熟的软件质量保证实践，审查人员检查开发人员提交的代码，并验证代码的质量。在代码审查讨论期间，审查者和开发人员可能会使用代码片段来提供必要的信息(例如，建议或解释)。然而，很少有人知道代码审查中代码片段的意图和影响。为此，我们进行了初步的研究，以调查代码片段的性质及其在代码审查中的目的。我们手工收集并核对了OpenStack社区Nova和Neutron项目的10790条评审意见，最终获得了626条包含代码片段的评审意见供我们进一步分析。结果表明:(1)代码片段在代码评审中使用不普遍，大部分代码片段是由评审人员提供的。(2)我们确定了审稿人提供的代码片段的两个高级目的(即建议和引用)和六个详细目的，其中，改进代码实现是最常见的目的。(3)对于以建议为目的的代码审查中的代码片段，开发人员接受的比例约为68.1%。研究结果强调了在代码审查中使用代码片段的研究方向。

{"title":"Understanding Code Snippets in Code Reviews: A Preliminary Study of the OpenStack Community","authors":"Liming Fu, Peng Liang, Beiqi Zhang","doi":"10.1145/3524610.3527884","DOIUrl":"https://doi.org/10.1145/3524610.3527884","url":null,"abstract":"Code review is a mature practice for software quality assurance in software development with which reviewers check the code that has been committed by developers, and verify the quality of code. During the code review discussions, reviewers and developers might use code snippets to provide necessary information (e.g., suggestions or explanations). However, little is known about the intentions and impacts of code snippets in code reviews. To this end, we conducted a preliminary study to investigate the nature of code snippets and their purposes in code reviews. We manually collected and checked 10,790 review comments from the Nova and Neutron projects of the OpenStack community, and finally obtained 626 review comments that contain code snippets for further analysis. The results show that: (1) code snippets are not prevalently used in code reviews, and most of the code snippets are provided by reviewers. (2) We identified two high-level purposes of code snippets provided by reviewers (i.e., Suggestion and Citation) with six detailed purposes, among which, Improving Code Implementation is the most common purpose. (3) For the code snippets in code reviews with the aim of suggestion, around 68.1% was accepted by developers. The results highlight promising research directions on using code snippets in code reviews.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127908643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Does Coding in Pythonic Zen Peak Performance? Preliminary Experiments of Nine Pythonic Idioms at Scale 在Pythonic Zen中编码会达到性能峰值吗?九种蟒蛇习语的初步实验

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-03-28 DOI: 10.1145/3524610.3527879

P. Leelaprute, Bodin Chinthanet, Supatsara Wattanakriengkrai, R. Kula, Pongchai Jaisri, T. Ishio

In the field of data science, and for academics in general, the Python programming language is a popular choice, mainly because of its libraries for storing, manipulating, and gaining insight from data. Evidence includes the versatile set of machine learning, data visualization, and manipulation packages used for the ever-growing size of available data. The Zen of Python is a set of guiding design principles that developers use to write acceptable and elegant Python code. Most principles revolve around simplicity. However, as the need to compute large amounts of data, performance has become a necessity for the Python programmer. The new idea in this paper is to confirm whether writing the Pythonic way peaks performance at scale. As a starting point, we conduct a set of preliminary experiments to evaluate nine Pythonic code examples by comparing the performance of both Pythonic and Non-Pythonic code snippets. Our results reveal that writing in Pythonic idioms may save memory and time. We show that incorporating list comprehension, generator expression, zip, and itertools.zip_longest idioms can save up to 7,000 MB and up to 32.25 seconds. The results open more questions on how they could be utilized in a real-world setting. The replication package includes all scripts, and the results are available at https://doi.org/10.5281/zenodo.5712349

在数据科学领域，对于一般的学者来说，Python编程语言是一种流行的选择，主要是因为它的库用于存储、操作和从数据中获得洞察力。证据包括通用的机器学习、数据可视化和用于不断增长的可用数据的操作包。Python之禅是一组指导设计原则，开发人员使用这些原则来编写可接受且优雅的Python代码。大多数原则都围绕着简单性。然而，由于需要计算大量的数据，性能已经成为Python程序员的必需品。本文的新想法是确认编写python方式是否在规模上达到性能峰值。作为起点，我们进行了一组初步实验，通过比较python和非python代码片段的性能来评估9个python代码示例。我们的研究结果表明，用python习语写作可以节省内存和时间。我们展示了合并列表推导、生成器表达式、zip和itertools.zip_longest习惯用法可以节省高达7,000 MB的内存和高达32.25秒的时间。研究结果提出了更多关于如何在现实环境中利用它们的问题。复制包包含所有脚本，结果可在https://doi.org/10.5281/zenodo.5712349上获得

{"title":"Does Coding in Pythonic Zen Peak Performance? Preliminary Experiments of Nine Pythonic Idioms at Scale","authors":"P. Leelaprute, Bodin Chinthanet, Supatsara Wattanakriengkrai, R. Kula, Pongchai Jaisri, T. Ishio","doi":"10.1145/3524610.3527879","DOIUrl":"https://doi.org/10.1145/3524610.3527879","url":null,"abstract":"In the field of data science, and for academics in general, the Python programming language is a popular choice, mainly because of its libraries for storing, manipulating, and gaining insight from data. Evidence includes the versatile set of machine learning, data visualization, and manipulation packages used for the ever-growing size of available data. The Zen of Python is a set of guiding design principles that developers use to write acceptable and elegant Python code. Most principles revolve around simplicity. However, as the need to compute large amounts of data, performance has become a necessity for the Python programmer. The new idea in this paper is to confirm whether writing the Pythonic way peaks performance at scale. As a starting point, we conduct a set of preliminary experiments to evaluate nine Pythonic code examples by comparing the performance of both Pythonic and Non-Pythonic code snippets. Our results reveal that writing in Pythonic idioms may save memory and time. We show that incorporating list comprehension, generator expression, zip, and itertools.zip_longest idioms can save up to 7,000 MB and up to 32.25 seconds. The results open more questions on how they could be utilized in a real-world setting. The replication package includes all scripts, and the results are available at https://doi.org/10.5281/zenodo.5712349","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130187119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

HELoC: Hierarchical Contrastive Learning of Source Code Representation 源代码表示的层次对比学习

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-03-27 DOI: 10.1145/3524610.3527896

Xiao Wang, Qiong Wu, Hongyu Zhang, Chen Lyu, Xue Jiang, Zhuoran Zheng, Lei Lyu, Songlin Hu

Abstract syntax trees (ASTs) play a crucial role in source code representation. However, due to the large number of nodes in an AST and the typically deep AST hierarchy, it is challenging to learn the hierarchical structure of an AST effectively. In this paper, we propose HELoC, a hierarchical contrastive learning model for source code representation. To effectively learn the AST hierarchy, we use contrastive learning to allow the network to predict the AST node level and learn the hierarchical relationships between nodes in a self-supervised manner, which makes the representation vectors of nodes with greater differences in AST levels farther apart in the embedding space. By using such vectors, the structural similarities between code snippets can be measured more precisely. In the learning process, a novel GNN (called Residual Self-attention Graph Neural Network, RSGNN) is designed, which enables HELoC to focus on embedding the local structure of an AST while capturing its overall structure. HELoC is self-supervised and can be applied to many source code related downstream tasks such as code classification, code clone detection, and code clustering after pre-training. Our extensive experiments demonstrate that HELoC outperforms the state-of-the-art source code representation models.

抽象语法树(ast)在源代码表示中起着至关重要的作用。然而，由于AST中节点数量众多，且AST层次结构通常较深，因此有效地学习AST的层次结构是一项挑战。在本文中，我们提出了HELoC，一种用于源代码表示的分层对比学习模型。为了有效地学习AST层次结构，我们采用对比学习方法，允许网络以自监督的方式预测AST节点层次，学习节点之间的层次关系，使得AST层次差异较大的节点的表示向量在嵌入空间中相距更远。通过使用这些向量，可以更精确地测量代码片段之间的结构相似性。在学习过程中，设计了一种新的GNN(称为残量自注意图神经网络，RSGNN)，使HELoC能够在捕获AST整体结构的同时专注于嵌入AST的局部结构。HELoC是自监督的，经过预训练后可以应用于许多与源代码相关的下游任务，如代码分类、代码克隆检测、代码聚类等。我们的大量实验表明，HELoC优于最先进的源代码表示模型。

{"title":"HELoC: Hierarchical Contrastive Learning of Source Code Representation","authors":"Xiao Wang, Qiong Wu, Hongyu Zhang, Chen Lyu, Xue Jiang, Zhuoran Zheng, Lei Lyu, Songlin Hu","doi":"10.1145/3524610.3527896","DOIUrl":"https://doi.org/10.1145/3524610.3527896","url":null,"abstract":"Abstract syntax trees (ASTs) play a crucial role in source code representation. However, due to the large number of nodes in an AST and the typically deep AST hierarchy, it is challenging to learn the hierarchical structure of an AST effectively. In this paper, we propose HELoC, a hierarchical contrastive learning model for source code representation. To effectively learn the AST hierarchy, we use contrastive learning to allow the network to predict the AST node level and learn the hierarchical relationships between nodes in a self-supervised manner, which makes the representation vectors of nodes with greater differences in AST levels farther apart in the embedding space. By using such vectors, the structural similarities between code snippets can be measured more precisely. In the learning process, a novel GNN (called Residual Self-attention Graph Neural Network, RSGNN) is designed, which enables HELoC to focus on embedding the local structure of an AST while capturing its overall structure. HELoC is self-supervised and can be applied to many source code related downstream tasks such as code classification, code clone detection, and code clustering after pre-training. Our extensive experiments demonstrate that HELoC outperforms the state-of-the-art source code representation models.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127761457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Anchoring Code Understandability Evaluations Through Task Descriptions 通过任务描述锚定代码可理解性评估

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-03-25 DOI: 10.1145/3524610.3527904

Marvin Wyrich, Lasse Merz, D. Graziotin

In code comprehension experiments, participants are usually told at the beginning what kind of code comprehension task to expect. Describing experiment scenarios and experimental tasks will influence participants in ways that are sometimes hard to predict and control. In particular, describing or even mentioning the difficulty of a code comprehension task might anchor participants and their perception of the task itself. In this study, we investigated in a randomized, controlled experiment with 256 participants (50 software professionals and 206 computer science students) whether a hint about the difficulty of the code to be understood in a task description anchors participants in their own code comprehensibility ratings. Subjective code evaluations are a commonly used measure for how well a developer in a code comprehension study understood code. Accordingly, it is important to understand how robust these measures are to cognitive biases such as the anchoring effect. Our results show that participants are significantly influenced by the initial scenario description in their assessment of code com-prehensibility. An initial hint of hard to understand code leads participants to assess the code as harder to understand than partic-ipants who received no hint or a hint of easy to understand code. This affects students and professionals alike. We discuss examples of design decisions and contextual factors in the conduct of code comprehension experiments that can induce an anchoring effect, and recommend the use of more robust comprehension measures in code comprehension studies to enhance the validity of results.

在代码理解实验中，参与者通常在一开始就被告知将要进行什么样的代码理解任务。描述实验场景和实验任务将以有时难以预测和控制的方式影响参与者。特别是，描述甚至提及代码理解任务的难度可能会锚定参与者和他们对任务本身的看法。在这项研究中，我们对256名参与者(50名软件专业人员和206名计算机科学专业学生)进行了随机对照实验，调查了任务描述中关于代码理解难度的暗示是否会影响参与者自己的代码可理解性评级。在代码理解研究中，主观代码评估是一种常用的衡量开发人员对代码理解程度的方法。因此，重要的是要了解这些措施是如何稳健的认知偏差，如锚定效应。结果表明，初始情景描述显著影响了参与者对代码可理解性的评价。最初的难以理解的代码提示会导致参与者认为代码比没有提示或容易理解的代码提示更难理解。这对学生和专业人士都有影响。我们讨论了在代码理解实验中设计决策和上下文因素的例子，这些例子可以诱导锚定效应，并建议在代码理解研究中使用更稳健的理解措施来提高结果的有效性。

{"title":"Anchoring Code Understandability Evaluations Through Task Descriptions","authors":"Marvin Wyrich, Lasse Merz, D. Graziotin","doi":"10.1145/3524610.3527904","DOIUrl":"https://doi.org/10.1145/3524610.3527904","url":null,"abstract":"In code comprehension experiments, participants are usually told at the beginning what kind of code comprehension task to expect. Describing experiment scenarios and experimental tasks will influence participants in ways that are sometimes hard to predict and control. In particular, describing or even mentioning the difficulty of a code comprehension task might anchor participants and their perception of the task itself. In this study, we investigated in a randomized, controlled experiment with 256 participants (50 software professionals and 206 computer science students) whether a hint about the difficulty of the code to be understood in a task description anchors participants in their own code comprehensibility ratings. Subjective code evaluations are a commonly used measure for how well a developer in a code comprehension study understood code. Accordingly, it is important to understand how robust these measures are to cognitive biases such as the anchoring effect. Our results show that participants are significantly influenced by the initial scenario description in their assessment of code com-prehensibility. An initial hint of hard to understand code leads participants to assess the code as harder to understand than partic-ipants who received no hint or a hint of easy to understand code. This affects students and professionals alike. We discuss examples of design decisions and contextual factors in the conduct of code comprehension experiments that can induce an anchoring effect, and recommend the use of more robust comprehension measures in code comprehension studies to enhance the validity of results.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"262 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117166425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

PTM4Tag: Sharpening Tag Recommendation of Stack Overflow Posts with Pre-trained Models PTM4Tag:使用预训练模型的堆栈溢出帖子的锐化标签推荐

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-03-21 DOI: 10.1145/3524610.3527897

Junda He, Bowen Xu, Zhou Yang, Donggyun Han, Chengran Yang, David Lo

Stack Overflow is often viewed as one of the most influential Software Question & Answer (SQA) websites, containing millions of programming-related questions and answers. Tags play a critical role in efficiently structuring the contents in Stack Overflow and are vital to support a range of site operations, e.g., querying relevant contents. Poorly selected tags often introduce extra noise and redundancy, which raises problems like tag synonym and tag explosion. Thus, an automated tag recommendation technique that can accurately recommend high-quality tags is desired to alleviate the problems mentioned above. Inspired by the recent success of pre-trained language models (PTMs) in natural language processing (NLP), we present PTM4Tag, a tag recommendation framework for Stack Overflow posts that utilize PTMs with a triplet architecture, which models the components of a post, i.e., Title, Description, and Code with independent language models. To the best of our knowledge, this is the first work that leverages PTMs in the tag recommendation task of SQA sites. We comparatively evaluate the performance of PTM4Tag based on five popular pre-trained models: BERT, RoBERTa, ALBERT, CodeBERT, and BERTOverflow. Our results show that leveraging CodeBERT, a software engineering (SE) domain-specific PTM in PTM4Tag achieves the best performance among the five considered PTMs and outperforms the state-of-the-art Convolutional Neural Network-based approach by a large margin in terms of average Precision@k, Recall@k, and F1-score@k. We conduct an ablation study to quantify the contribution of a post's constituent components (Title, Description, and Code Snippets) to the performance of PTM4Tag. Our results show that Title is the most important in predicting the most relevant tags, and utilizing all the components achieves the best performance.

Stack Overflow通常被视为最具影响力的软件问答(SQA)网站之一，包含数百万与编程相关的问题和答案。标签在有效地构建Stack Overflow中的内容方面起着至关重要的作用，并且对于支持一系列站点操作(例如查询相关内容)至关重要。选择不当的标签往往会带来额外的噪声和冗余，从而引发标签同义词和标签爆炸等问题。因此，需要一种能够准确推荐高质量标签的自动标签推荐技术来缓解上述问题。受自然语言处理(NLP)中最近成功的预训练语言模型(PTMs)的启发，我们提出了PTM4Tag，这是一个针对堆栈溢出帖子的标签推荐框架，它利用PTMs和三重体系结构，用独立的语言模型对帖子的组件(即标题、描述和代码)进行建模。据我们所知，这是第一个在SQA站点的标签推荐任务中利用ptm的工作。我们基于五种流行的预训练模型:BERT、RoBERTa、ALBERT、CodeBERT和BERTOverflow来比较评估PTM4Tag的性能。我们的结果表明，在PTM4Tag中利用CodeBERT(软件工程(SE)领域特定的PTM)在五个考虑的PTM中实现了最佳性能，并且在平均Precision@k、Recall@k和F1-score@k方面大大优于最先进的基于卷积神经网络的方法。我们进行了一项消融研究，以量化帖子的组成部分(标题、描述和代码片段)对PTM4Tag性能的贡献。我们的研究结果表明，标题是预测最相关标签的最重要因素，并且利用所有组件达到最佳性能。

{"title":"PTM4Tag: Sharpening Tag Recommendation of Stack Overflow Posts with Pre-trained Models","authors":"Junda He, Bowen Xu, Zhou Yang, Donggyun Han, Chengran Yang, David Lo","doi":"10.1145/3524610.3527897","DOIUrl":"https://doi.org/10.1145/3524610.3527897","url":null,"abstract":"Stack Overflow is often viewed as one of the most influential Software Question & Answer (SQA) websites, containing millions of programming-related questions and answers. Tags play a critical role in efficiently structuring the contents in Stack Overflow and are vital to support a range of site operations, e.g., querying relevant contents. Poorly selected tags often introduce extra noise and redundancy, which raises problems like tag synonym and tag explosion. Thus, an automated tag recommendation technique that can accurately recommend high-quality tags is desired to alleviate the problems mentioned above. Inspired by the recent success of pre-trained language models (PTMs) in natural language processing (NLP), we present PTM4Tag, a tag recommendation framework for Stack Overflow posts that utilize PTMs with a triplet architecture, which models the components of a post, i.e., Title, Description, and Code with independent language models. To the best of our knowledge, this is the first work that leverages PTMs in the tag recommendation task of SQA sites. We comparatively evaluate the performance of PTM4Tag based on five popular pre-trained models: BERT, RoBERTa, ALBERT, CodeBERT, and BERTOverflow. Our results show that leveraging CodeBERT, a software engineering (SE) domain-specific PTM in PTM4Tag achieves the best performance among the five considered PTMs and outperforms the state-of-the-art Convolutional Neural Network-based approach by a large margin in terms of average Precision@k, Recall@k, and F1-score@k. We conduct an ablation study to quantify the contribution of a post's constituent components (Title, Description, and Code Snippets) to the performance of PTM4Tag. Our results show that Title is the most important in predicting the most relevant tags, and utilizing all the components achieves the best performance.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121570184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Example-Based Vulnerability Detection and Repair in Java Code Java代码中基于示例的漏洞检测和修复

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-03-17 DOI: 10.1145/3524610.3527895

Y. Zhang, Ya Xiao, Md Mahir Asef Kabir, D. Yao, Na Meng

The Java libraries JCA and JSSE offer cryptographic APIs to facilitate secure coding. When developers misuse some of the APIs, their code becomes vulnerable to cyber-attacks. To eliminate such vulnerabilities, people built tools to detect security-API misuses via pattern matching. However, most tools do not (1) fix misuses or (2) allow users to extend tools' pattern sets. To overcome both limitations, we created Seader-an example-based approach to detect and repair security-API misuses. Given an exemplar $langletext{insecure, secure}rangle$ code pair, Seader compares the snippets to infer any API-misuse template and corresponding fixing edit. Based on the inferred info, given a program, Seader performs inter-procedural static analysis to search for security-API misuses and to propose customized fixes. For evaluation, we applied Seader to 28 $langletext{insecure, secure}rangle$ code pairs; Seader successfully inferred 21 unique API-misuse templates and related fixes. With these $langletext{vulnerability, fix}rangle$ patterns, we applied Seader to a program benchmark that has 86 known vulnerabilities. Seader detected vulnerabilities with 95% precision, 72% recall, and 82% F-score. We also applied Seader to 100 open-source projects and manually checked 77 suggested repairs; 76 of the repairs were correct. Seader can help developers correctly use security APIs.

Java库JCA和JSSE提供了加密api，以促进安全编码。当开发人员误用某些api时，他们的代码就容易受到网络攻击。为了消除这些漏洞，人们构建了通过模式匹配来检测安全api滥用的工具。然而，大多数工具并没有(1)修正误用或(2)允许用户扩展工具的模式集。为了克服这两个限制，我们创建了seader——一种基于示例的方法来检测和修复安全api的滥用。给定一个范例$ rangle text{insecure, secure}rangle$代码对，Seader比较这些代码片段以推断出任何api误用的模板和相应的修复编辑。基于推断的信息，给定一个程序，Seader执行过程间静态分析以搜索安全api的滥用并提出定制的修复方案。为了评估，我们对28对$ rangle text{insecure, secure}rangle$代码对应用Seader;Seader成功推断出21个独特的api误用模板和相关修复。有了这些$ rangle text{vulnerability, fix}rangle$模式，我们将Seader应用到一个有86个已知漏洞的程序基准测试中。Seader检测漏洞的准确率为95%，召回率为72%，f值为82%。我们还将Seader应用于100个开源项目，并手动检查了77个建议的修复;76次修理是正确的。Seader可以帮助开发人员正确使用安全性api。

{"title":"Example-Based Vulnerability Detection and Repair in Java Code","authors":"Y. Zhang, Ya Xiao, Md Mahir Asef Kabir, D. Yao, Na Meng","doi":"10.1145/3524610.3527895","DOIUrl":"https://doi.org/10.1145/3524610.3527895","url":null,"abstract":"The Java libraries JCA and JSSE offer cryptographic APIs to facilitate secure coding. When developers misuse some of the APIs, their code becomes vulnerable to cyber-attacks. To eliminate such vulnerabilities, people built tools to detect security-API misuses via pattern matching. However, most tools do not (1) fix misuses or (2) allow users to extend tools' pattern sets. To overcome both limitations, we created Seader-an example-based approach to detect and repair security-API misuses. Given an exemplar $langletext{insecure, secure}rangle$ code pair, Seader compares the snippets to infer any API-misuse template and corresponding fixing edit. Based on the inferred info, given a program, Seader performs inter-procedural static analysis to search for security-API misuses and to propose customized fixes. For evaluation, we applied Seader to 28 $langletext{insecure, secure}rangle$ code pairs; Seader successfully inferred 21 unique API-misuse templates and related fixes. With these $langletext{vulnerability, fix}rangle$ patterns, we applied Seader to a program benchmark that has 86 known vulnerabilities. Seader detected vulnerabilities with 95% precision, 72% recall, and 82% F-score. We also applied Seader to 100 open-source projects and manually checked 77 suggested repairs; 76 of the repairs were correct. Seader can help developers correctly use security APIs.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116944678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Code Smells in Elixir: Early Results from a Grey Literature Review 灵丹妙药中的代码气味:灰色文献综述的早期结果

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-03-16 DOI: 10.1145/3524610.3527881

L. F. M. Vegi, M. T. Valente

Elixir is a new functional programming language whose popularity is rising in the industry. However, there are few works in the literature focused on studying the internal quality of systems implemented in this language. Particularly, to the best of our knowledge, there is currently no catalog of code smells for Elixir. Therefore, in this paper, through a grey literature review, we investigate whether Elixir developers discuss code smells. Our preliminary results indicate that 11 of the 22 traditional code smells cataloged by Fowler and Beck are discussed by Elixir developers. We also propose a list of 18 new smells specific for Elixir systems and investigate whether these smells are currently identified by Credo, a well-known static code analysis tool for Elixir. We conclude that only two traditional code smells and one Elixir-specific code smell are automatically detected by this tool. Thus, these early results represent an opportunity for extending tools such as Credo to detect code smells and then contribute to improving the internal quality of Elixir systems.

Elixir是一种新的函数式编程语言，在业界越来越受欢迎。然而，在文献中很少有作品集中研究用这种语言实现的系统的内部质量。特别是，据我们所知，目前还没有Elixir的代码气味目录。因此，在本文中，通过灰色文献回顾，我们调查Elixir开发人员是否讨论代码气味。我们的初步结果表明，在Fowler和Beck编录的22种传统代码气味中，有11种被Elixir开发人员讨论过。我们还为Elixir系统提出了18种新气味的列表，并调查这些气味是否目前被Credo识别，Credo是Elixir的一个著名的静态代码分析工具。我们得出的结论是，该工具只能自动检测到两种传统的代码气味和一种elixir特定的代码气味。因此，这些早期的结果为扩展Credo之类的工具提供了机会，以检测代码气味，然后为改进Elixir系统的内部质量做出贡献。

引用次数: 4