2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)最新文献_第2页

Estimating Developers' Cognitive Load at a Fine-grained Level Using Eye-tracking Measures 使用眼动追踪方法在细粒度水平上估计开发者的认知负荷

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-05-01 DOI: 10.1145/3524610.3527890

Amine Abbad Andaloussi, Thierry Sorg, Barbara Weber

The comprehension of source code is a task inherent to many software development activities. Code change, code review and debugging are examples of these activities that depend heavily on developers' understanding of the source code. This ability is threatened when developers' cognitive load approaches the limits of their working memory, which in turn affects their understanding and makes them more prone to errors. Measures capturing humans' behavior and changes in their physiological state have been proposed in a number of studies to investigate developers' cognitive load. However, the majority of the existing approaches operate at a coarse-grained task level estimating the difficulty of the source code as a whole. Hence, they cannot be used to pinpoint the mentally demanding parts of it. We address this limitation in this paper through a non-intrusive approach based on eye-tracking. We collect users' behavioral and physiological features while they are engaging with source code and train a set of machine learning models to estimate the mentally demanding parts of code. The evaluation of our models returns F1, recall, accuracy and precision scores up to 85.65%, 84.25%, 86.24% and 88.61%, respectively, when estimating the mental demanding fragments of code. Our approach enables a fine-grained analysis of cognitive load and allows identifying the parts challenging the comprehension of source code. Such an approach provides the means to test new hypotheses addressing the characteristics of specific parts within the source code and paves the road for novel techniques for code review and adaptive e-learning.

对源代码的理解是许多软件开发活动所固有的任务。代码更改、代码审查和调试是这些活动的例子，这些活动严重依赖于开发人员对源代码的理解。当开发人员的认知负荷接近他们工作记忆的极限时，这种能力就会受到威胁，这反过来会影响他们的理解，使他们更容易出错。许多研究都提出了捕捉人类行为和生理状态变化的方法来研究开发者的认知负荷。然而，现有的大多数方法都是在粗粒度任务级别上操作的，对整个源代码的难度进行了估计。因此，它们不能用来精确地指出它的精神要求的部分。本文通过一种基于眼动追踪的非侵入性方法来解决这一限制。我们收集用户在使用源代码时的行为和生理特征，并训练一组机器学习模型来估计代码中需要脑力的部分。我们的模型在估计脑力要求的代码片段时，其F1、召回率、准确率和精度得分分别达到85.65%、84.25%、86.24%和88.61%。我们的方法支持对认知负荷的细粒度分析，并允许识别对源代码的理解有挑战的部分。这种方法提供了一种方法来测试针对源代码中特定部分特征的新假设，并为代码审查和自适应电子学习的新技术铺平了道路。

{"title":"Estimating Developers' Cognitive Load at a Fine-grained Level Using Eye-tracking Measures","authors":"Amine Abbad Andaloussi, Thierry Sorg, Barbara Weber","doi":"10.1145/3524610.3527890","DOIUrl":"https://doi.org/10.1145/3524610.3527890","url":null,"abstract":"The comprehension of source code is a task inherent to many software development activities. Code change, code review and debugging are examples of these activities that depend heavily on developers' understanding of the source code. This ability is threatened when developers' cognitive load approaches the limits of their working memory, which in turn affects their understanding and makes them more prone to errors. Measures capturing humans' behavior and changes in their physiological state have been proposed in a number of studies to investigate developers' cognitive load. However, the majority of the existing approaches operate at a coarse-grained task level estimating the difficulty of the source code as a whole. Hence, they cannot be used to pinpoint the mentally demanding parts of it. We address this limitation in this paper through a non-intrusive approach based on eye-tracking. We collect users' behavioral and physiological features while they are engaging with source code and train a set of machine learning models to estimate the mentally demanding parts of code. The evaluation of our models returns F1, recall, accuracy and precision scores up to 85.65%, 84.25%, 86.24% and 88.61%, respectively, when estimating the mental demanding fragments of code. Our approach enables a fine-grained analysis of cognitive load and allows identifying the parts challenging the comprehension of source code. Such an approach provides the means to test new hypotheses addressing the characteristics of specific parts within the source code and paves the road for novel techniques for code review and adaptive e-learning.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124131038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

An Exploratory Study of Analyzing JavaScript Online Code Clones JavaScript在线代码克隆分析的探索性研究

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-05-01 DOI: 10.1145/3524610.3528390

Md Rakib Hossain Misu, A. Satter

Online code clones occur due to reusing code snippets in software repositories from online resources such as GitHub and Stack Overflow. Previous works have shown that snippets from Stack Overflow are reused in other open-source projects and vice versa. Analysis of online code reusing patterns could identify outdated code, understand developers' practices, and help to design new code search engines. This study analyzed JavaScript online code clones between Stack Overflow and GitHub repositories. We first developed a JavaScript code corpus to search online clones. The clone search results reported 12,579 online clones between 276,547 non-trivial syntactically validated Stack Overflow snippets and 292 GitHub repositories. We manually classified the top 10% (1257) pairs of clones in seven online clone patterns. We observed that around 70% of JavaScript snippets in Stack Overflow posts are copied from GitHub repositories or from other external sources. Moreover, only 30.59% of JavaScript Snippets in Stack Overflow accepted answers could be considered as reusable snippets.

在线代码克隆是由于重用来自在线资源(如GitHub和Stack Overflow)的软件存储库中的代码片段而发生的。以前的工作表明Stack Overflow的代码片段可以在其他开源项目中重用，反之亦然。分析在线代码重用模式可以识别过时的代码，了解开发人员的实践，并帮助设计新的代码搜索引擎。这项研究分析了Stack Overflow和GitHub存储库之间的JavaScript在线代码克隆。我们首先开发了一个JavaScript代码语料库来搜索在线克隆。克隆搜索结果报告了276,547个非平凡语法验证的Stack Overflow片段和292个GitHub存储库之间的12,579个在线克隆。我们在7种在线克隆模式中对排名前10%(1257对)的克隆进行了人工分类。我们观察到Stack Overflow帖子中大约70%的JavaScript代码片段是从GitHub存储库或其他外部来源复制的。此外，Stack Overflow中只有30.59%的JavaScript snippet可以被认为是可重用的snippet。

引用次数: 0

Fine-Grained Code-Comment Semantic Interaction Analysis 细粒度代码-注释语义交互分析

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-05-01 DOI: 10.1145/3524610.3527887

Mingyang Geng, Shangwen Wang, Dezun Dong, Shanzhi Gu, Fang Peng, Weijian Ruan, Xiangke Liao

Code comment, i.e., the natural language text to describe code, is considered as a killer for program comprehension. Current literature approaches mainly focus on comment generation or comment update, and thus fall short on explaining which part of the code leads to a specific content in the comment. In this paper, we propose that addressing such a challenge can better facilitate code under-standing. We propose Fosterer, which can build fine-grained se-mantic interactions between code statements and comment tokens. It not only leverages the advanced deep learning techniques like cross-modal learning and contrastive learning, but also borrows the weapon of pre-trained vision models. Specifically, it mimics the comprehension practice of developers, treating code statements as image patches and comments as texts, and uses contrastive learning to match the semantically-related part between the visual and tex-tual information. Experiments on a large-scale manually-labelled dataset show that our approach can achieve an Fl-score around 80%, and such a performance exceeds a heuristic-based baseline to a large extent. We also find that Fosterer can work with a high efficiency, i.e., it only needs 1.5 seconds for inferring the results for a code-comment pair. Furthermore, a user study demonstrates its usability: for 65% cases, its prediction results are considered as useful for improving code understanding. Therefore, our research sheds light on a promising direction for program comprehension.

代码注释，即描述代码的自然语言文本，被认为是程序理解的杀手。当前的文献方法主要关注于注释生成或注释更新，因此无法解释代码的哪一部分导致了注释中的特定内容。在本文中，我们提出解决这样的挑战可以更好地促进代码理解。我们建议使用Fosterer，它可以在代码语句和注释令牌之间构建细粒度的语义交互。它不仅利用了跨模态学习和对比学习等先进的深度学习技术，还借用了预训练视觉模型的武器。具体来说，它模仿了开发人员的理解实践，将代码语句视为图像补丁，将注释视为文本，并使用对比学习来匹配视觉和文本信息之间的语义相关部分。在大规模人工标记数据集上的实验表明，我们的方法可以达到80%左右的l-score，这种性能在很大程度上超过了基于启发式的基线。我们还发现，Fosterer的工作效率很高，也就是说，它只需要1.5秒就可以推断出代码注释对的结果。此外，一项用户研究证明了它的可用性:在65%的情况下，它的预测结果被认为对提高代码理解有用。因此，我们的研究为程序理解指明了一个有希望的方向。

{"title":"Fine-Grained Code-Comment Semantic Interaction Analysis","authors":"Mingyang Geng, Shangwen Wang, Dezun Dong, Shanzhi Gu, Fang Peng, Weijian Ruan, Xiangke Liao","doi":"10.1145/3524610.3527887","DOIUrl":"https://doi.org/10.1145/3524610.3527887","url":null,"abstract":"Code comment, i.e., the natural language text to describe code, is considered as a killer for program comprehension. Current literature approaches mainly focus on comment generation or comment update, and thus fall short on explaining which part of the code leads to a specific content in the comment. In this paper, we propose that addressing such a challenge can better facilitate code under-standing. We propose Fosterer, which can build fine-grained se-mantic interactions between code statements and comment tokens. It not only leverages the advanced deep learning techniques like cross-modal learning and contrastive learning, but also borrows the weapon of pre-trained vision models. Specifically, it mimics the comprehension practice of developers, treating code statements as image patches and comments as texts, and uses contrastive learning to match the semantically-related part between the visual and tex-tual information. Experiments on a large-scale manually-labelled dataset show that our approach can achieve an Fl-score around 80%, and such a performance exceeds a heuristic-based baseline to a large extent. We also find that Fosterer can work with a high efficiency, i.e., it only needs 1.5 seconds for inferring the results for a code-comment pair. Furthermore, a user study demonstrates its usability: for 65% cases, its prediction results are considered as useful for improving code understanding. Therefore, our research sheds light on a promising direction for program comprehension.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114799313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Find Bugs in Static Bug Finders 在静态Bug查找器中查找Bug

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-05-01 DOI: 10.1145/3524610.3527899

Junjie Wang, Yuchao Huang, Song Wang, Qing Wang

Static bug finders (also known as static code analyzers, e.g., Find-Bugs, SonarQube) have been widely-adopted by developers to find bugs in real-world software projects. They leverage predefined heuristic static analysis rules to scan source code or binary code of a software project, and report violations to these rules as warnings to be verified. However, the advantages of static bug finders are overshadowed by such issues as uncovered obvious bugs, false positives, etc. To improve these tools, many techniques have been proposed to filter out false positives reported or design new static analysis rules. Nevertheless, the under-performance of bug finders can also be caused by the incorrectness of current rules contained in the static bug finders, which is not explored yet. In this work, we propose a differential testing approach to detect bugs in the rules of four widely-used static bug finders, i.e., SonarQube, PMD, SpotBugs, and ErrorProne, and conduct a qualitative study about the bugs found. The experiment on 2,728 open source projects reveals 46 bugs in the static bug finders, among which 30 are fixed or confirmed and the left are awaiting confirmation. We also summarize 13 bug patterns in the static analysis rules based on their context and root causes, which can serve as the checklist for designing and implementing other rules and/or in other tools. This study indicates that the commonly-used static bug finders are not as reliable as they might have been envisaged. It not only demonstrates the effectiveness of our approach, but also highlights the need to continue improving the reliability of the static bug finders.

静态bug查找器(也称为静态代码分析器，例如find - bugs、SonarQube)已经被开发人员广泛采用，用于在实际软件项目中查找bug。它们利用预定义的启发式静态分析规则来扫描软件项目的源代码或二进制代码，并报告违反这些规则的情况，作为需要验证的警告。然而，静态bug查找器的优点被诸如发现明显的bug、误报等问题所掩盖。为了改进这些工具，已经提出了许多技术来过滤报告的误报或设计新的静态分析规则。然而，bug查找器的性能低下也可能是由静态bug查找器中包含的当前规则不正确引起的，这一点还没有研究。在这项工作中，我们提出了一种不同的测试方法来检测四种广泛使用的静态bug查找器的规则中的bug，即SonarQube, PMD, SpotBugs和ErrorProne，并对所发现的bug进行定性研究。通过对2728个开源项目的实验，发现静态bug查找器中有46个bug，其中30个已经修复或确认，剩下的正在等待确认。我们还根据上下文和根本原因总结了静态分析规则中的13个错误模式，这些错误模式可以作为设计和实现其他规则和/或在其他工具中的检查清单。这项研究表明，常用的静态bug查找器并不像想象的那样可靠。它不仅证明了我们方法的有效性，而且还强调了继续改进静态bug查找器的可靠性的必要性。

{"title":"Find Bugs in Static Bug Finders","authors":"Junjie Wang, Yuchao Huang, Song Wang, Qing Wang","doi":"10.1145/3524610.3527899","DOIUrl":"https://doi.org/10.1145/3524610.3527899","url":null,"abstract":"Static bug finders (also known as static code analyzers, e.g., Find-Bugs, SonarQube) have been widely-adopted by developers to find bugs in real-world software projects. They leverage predefined heuristic static analysis rules to scan source code or binary code of a software project, and report violations to these rules as warnings to be verified. However, the advantages of static bug finders are overshadowed by such issues as uncovered obvious bugs, false positives, etc. To improve these tools, many techniques have been proposed to filter out false positives reported or design new static analysis rules. Nevertheless, the under-performance of bug finders can also be caused by the incorrectness of current rules contained in the static bug finders, which is not explored yet. In this work, we propose a differential testing approach to detect bugs in the rules of four widely-used static bug finders, i.e., SonarQube, PMD, SpotBugs, and ErrorProne, and conduct a qualitative study about the bugs found. The experiment on 2,728 open source projects reveals 46 bugs in the static bug finders, among which 30 are fixed or confirmed and the left are awaiting confirmation. We also summarize 13 bug patterns in the static analysis rules based on their context and root causes, which can serve as the checklist for designing and implementing other rules and/or in other tools. This study indicates that the commonly-used static bug finders are not as reliable as they might have been envisaged. It not only demonstrates the effectiveness of our approach, but also highlights the need to continue improving the reliability of the static bug finders.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114975791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

C4: Contrastive Cross-Language Code Clone Detection C4:对比跨语言代码克隆检测

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-05-01 DOI: 10.1145/3524610.3527911

Chenning Tao, Qi Zhan, Xing Hu, Xin Xia

During software development, developers introduce code clones by reusing existing code to improve programming productivity. Considering the detrimental effects on software maintenance and evolution, many techniques are proposed to detect code clones. Existing approaches are mainly used to detect clones written in the same programming language. However, it is common to develop programs with the same functionality but in different programming languages to support various platforms. In this paper, we propose a new approach named C4, referring to Contrastive Cross-language Code Clone detection model. It can detect cross-language clones with learned representations effectively. C4 exploits the pre-trained model CodeBERT to convert programs in different languages into high-dimensional vector representations. In addition, we fine tune the C4 model through a constrastive learning objective that can effectively recognize clone pairs and non-clone pairs. To evaluate the effectiveness of our approach, we conduct extensive experiments on the dataset proposed by CLCDSA. Experimental results show that C4 achieves scores of 0.94, 0.90, and 0.92 in terms of precision, recall and F-measure and substantially outperforms the state-of-the-art baselines.

在软件开发过程中，开发人员通过重用现有代码来引入代码克隆，以提高编程效率。考虑到对软件维护和发展的不利影响，提出了许多检测代码克隆的技术。现有的方法主要用于检测用相同编程语言编写的克隆。然而，为了支持不同的平台，用不同的编程语言开发具有相同功能的程序是很常见的。在本文中，我们提出了一种新的方法，命名为C4，参考对比跨语言代码克隆检测模型。它可以有效地检测具有学习表征的跨语言克隆。C4利用预训练模型CodeBERT将不同语言的程序转换为高维向量表示。此外，我们通过一个约束学习目标对C4模型进行微调，使其能够有效地识别克隆对和非克隆对。为了评估我们的方法的有效性，我们在CLCDSA提出的数据集上进行了大量的实验。实验结果表明，C4在精度、召回率和F-measure方面达到了0.94、0.90和0.92的分数，大大优于最先进的基线。

{"title":"C4: Contrastive Cross-Language Code Clone Detection","authors":"Chenning Tao, Qi Zhan, Xing Hu, Xin Xia","doi":"10.1145/3524610.3527911","DOIUrl":"https://doi.org/10.1145/3524610.3527911","url":null,"abstract":"During software development, developers introduce code clones by reusing existing code to improve programming productivity. Considering the detrimental effects on software maintenance and evolution, many techniques are proposed to detect code clones. Existing approaches are mainly used to detect clones written in the same programming language. However, it is common to develop programs with the same functionality but in different programming languages to support various platforms. In this paper, we propose a new approach named C4, referring to Contrastive Cross-language Code Clone detection model. It can detect cross-language clones with learned representations effectively. C4 exploits the pre-trained model CodeBERT to convert programs in different languages into high-dimensional vector representations. In addition, we fine tune the C4 model through a constrastive learning objective that can effectively recognize clone pairs and non-clone pairs. To evaluate the effectiveness of our approach, we conduct extensive experiments on the dataset proposed by CLCDSA. Experimental results show that C4 achieves scores of 0.94, 0.90, and 0.92 in terms of precision, recall and F-measure and substantially outperforms the state-of-the-art baselines.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124965511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Automated Identification of Libraries from Vulnerability Data: Can We Do Better? 从漏洞数据自动识别库:我们能做得更好吗?

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-05-01 DOI: 10.1145/3524610.3527893

S. A. Haryono, Hong Jin Kang, Abhishek Sharma, Asankhaya Sharma, A. Santosa, Angela Yi, D. Lo

Software engineers depend heavily on software libraries and have to update their dependencies once vulnerabilities are found in them. Software Composition Analysis (SCA) helps developers identify vulnerable libraries used by an application. A key challenge is the identification of libraries related to a given reported vulnerability in the National Vulnerability Database (NVD), which may not ex-plicitly indicate the affected libraries. Recently, researchers have tried to address the problem of identifying the libraries from an NVD report by treating it as an extreme multi-label learning (XML) problem, characterized by its large number of possible labels and severe data sparsity. As input, the NVD report is provided, and as output, a set of relevant libraries is returned. In this work, we evaluated multiple XML techniques. While pre-vious work only evaluated a traditional XML technique, FastXML, we trained four other traditional XML models (DiSMEC, Parabel, Bonsai, ExtremeText) as well as two deep learning-based models (XML-CNN and LightXML). We compared both their effectiveness and the time cost of training and using the models for predictions. We find that other than DiSMEC and XML-CNN, recent XML mod-els outperform the FastXML model by 3%-10% in terms of F1-scores on Top-k (k=1,2,3) predictions. Furthermore, we observe significant improvements in both the training and prediction time of these XML models, with Bonsai and Parabel model achieving 627x and 589x faster training time and 12x faster prediction time from the FastXML baseline. We discuss the implications of our experimental results and highlight limitations for future work to address.

软件工程师在很大程度上依赖于软件库，一旦发现漏洞，就必须更新它们的依赖项。软件组合分析(SCA)帮助开发人员识别应用程序使用的易受攻击的库。一个关键的挑战是识别与国家漏洞数据库(NVD)中给定报告的漏洞相关的库，这可能不会明确指出受影响的库。最近，研究人员试图通过将其视为极端多标签学习(XML)问题来解决从NVD报告中识别库的问题，该问题的特点是可能存在大量标签和严重的数据稀疏性。作为输入，提供NVD报告，作为输出，返回一组相关库。在这项工作中，我们评估了多种XML技术。虽然之前的工作只评估了传统的XML技术FastXML，但我们训练了其他四个传统的XML模型(DiSMEC, Parabel, Bonsai, ExtremeText)以及两个基于深度学习的模型(XML- cnn和LightXML)。我们比较了它们的有效性和训练和使用模型进行预测的时间成本。我们发现，除了DiSMEC和XML- cnn，最近的XML模型在Top-k (k=1,2,3)预测的f1得分方面比FastXML模型高出3%-10%。此外，我们观察到这些XML模型在训练和预测时间上都有显著的改进，与FastXML基线相比，Bonsai和Parabel模型的训练时间分别快了627倍和589倍，预测时间快了12倍。我们讨论了实验结果的含义，并强调了未来工作需要解决的局限性。

{"title":"Automated Identification of Libraries from Vulnerability Data: Can We Do Better?","authors":"S. A. Haryono, Hong Jin Kang, Abhishek Sharma, Asankhaya Sharma, A. Santosa, Angela Yi, D. Lo","doi":"10.1145/3524610.3527893","DOIUrl":"https://doi.org/10.1145/3524610.3527893","url":null,"abstract":"Software engineers depend heavily on software libraries and have to update their dependencies once vulnerabilities are found in them. Software Composition Analysis (SCA) helps developers identify vulnerable libraries used by an application. A key challenge is the identification of libraries related to a given reported vulnerability in the National Vulnerability Database (NVD), which may not ex-plicitly indicate the affected libraries. Recently, researchers have tried to address the problem of identifying the libraries from an NVD report by treating it as an extreme multi-label learning (XML) problem, characterized by its large number of possible labels and severe data sparsity. As input, the NVD report is provided, and as output, a set of relevant libraries is returned. In this work, we evaluated multiple XML techniques. While pre-vious work only evaluated a traditional XML technique, FastXML, we trained four other traditional XML models (DiSMEC, Parabel, Bonsai, ExtremeText) as well as two deep learning-based models (XML-CNN and LightXML). We compared both their effectiveness and the time cost of training and using the models for predictions. We find that other than DiSMEC and XML-CNN, recent XML mod-els outperform the FastXML model by 3%-10% in terms of F1-scores on Top-k (k=1,2,3) predictions. Furthermore, we observe significant improvements in both the training and prediction time of these XML models, with Bonsai and Parabel model achieving 627x and 589x faster training time and 12x faster prediction time from the FastXML baseline. We discuss the implications of our experimental results and highlight limitations for future work to address.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129753502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Self-Supervised Learning of Smart Contract Representations 智能合约表示的自监督学习

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-05-01 DOI: 10.1145/3524610.3527894

Shouliang Yang, Xiaodong Gu, Beijun Shen

Learning smart contract representations can greatly facilitate the development of smart contracts in many tasks such as bug detection and clone detection. Existing approaches for learning program representations are difficult to apply to smart contracts which have insufficient data and significant homogenization. To overcome these challenges, in this paper, we propose SRCL, a novel, self-supervised approach for learning smart contract representations. Unlike ex-isting supervised methods, which are tied on task-specific data labels, SRCL leverages large-scale unlabeled data by self-supervised learning of both local and global information of smart contracts. It automatically extracts structural sequences from abstract syntax trees (ASTs). Then, two discriminators are designed to guide the Transformer encoder to learn local and global semantic features of smart contracts. We evaluate SRCL on a dataset of 75,006 smart contracts collected from Etherscan. Experimental results show that SRCL considerably outperforms the state-of-the-art code represen-tation models on three downstream tasks.

学习智能合约表示可以极大地促进智能合约在许多任务中的开发，例如bug检测和克隆检测。现有的学习程序表示的方法很难应用于数据不足和严重同质化的智能合约。为了克服这些挑战，在本文中，我们提出了SRCL，这是一种新颖的、自我监督的学习智能合约表示的方法。与现有的与特定任务数据标签绑定的监督方法不同，SRCL通过对智能合约的本地和全局信息的自我监督学习来利用大规模未标记数据。它自动从抽象语法树(ast)中提取结构序列。然后，设计了两个鉴别器来引导Transformer编码器学习智能合约的局部和全局语义特征。我们在从Etherscan收集的75,006个智能合约的数据集上评估SRCL。实验结果表明，SRCL在三个下游任务上的表现明显优于最先进的代码表示模型。

引用次数: 4

Accurate Generation of Trigger-Action Programs with Domain-Adapted Sequence-to-Sequence Learning 基于领域适应序列到序列学习的触发-动作程序的精确生成

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-05-01 DOI: 10.1145/3524610.3527922

Imam Nur Bani Yusuf, Lingxiao Jiang, David Lo

Trigger-action programming allows end users to write event-driven rules to automate smart devices and internet services. Users can create a trigger-action program (TAP) by specifying triggers and actions from a set of predefined functions along with suitable data fields for the functions. Many trigger-action programming platforms have emerged as the popularity grows, e.g., IFTTT, Microsoft Power Automate, and Samsung SmartThings. Despite their simplicity, composing trigger-action programs (TAPs) can still be challenging for end users due to the domain knowledge needed and enormous search space of many combinations of triggers and actions. We propose RecipeGen, a new deep learning-based approach that leverages Transformer sequence-to-sequence (seq2seq) architecture to generate TAPs on the fine-grained field-level granularity from natural language descriptions. Our approach adapts autoencoding pre-trained models to warm-start the encoder in the seq2seq model to boost the generation performance. We have evaluated RecipeGen on real-world datasets from the IFTTT platform against the prior state-of-the-art approach on the TAP generation task. Our empirical evaluation shows that the overall improvement against the prior best results ranges from 9.5%-26.5%. Our results also show that adopting a pre-trained autoencoding model boosts the MRR@3 further by 2.8%-10.8%. Further, in the field-level generation setting, RecipeGen achieves 0.591 and 0.575 in terms of MRR@3 and BLEU scores respectively.

触发操作编程允许最终用户编写事件驱动的规则，使智能设备和互联网服务自动化。用户可以通过从一组预定义的函数中指定触发器和操作以及函数的合适数据字段来创建触发器-操作程序(TAP)。随着越来越受欢迎，许多触发操作编程平台已经出现，例如IFTTT, Microsoft Power automation和Samsung SmartThings。尽管它们很简单，但由于需要领域知识和许多触发器和操作组合的巨大搜索空间，对最终用户来说，组合触发器-操作程序(tap)仍然是一个挑战。我们提出了RecipeGen，这是一种新的基于深度学习的方法，它利用Transformer序列到序列(seq2seq)架构，从自然语言描述中生成细粒度字段级粒度的tap。我们的方法采用自编码预训练模型来热启动seq2seq模型中的编码器，以提高生成性能。我们在IFTTT平台的真实数据集上对RecipeGen进行了评估，对比了之前最先进的TAP生成任务方法。我们的实证评估表明，与之前的最佳结果相比，总体改进幅度在9.5%-26.5%之间。我们的研究结果还表明，采用预训练的自动编码模型可使MRR@3进一步提高2.8%-10.8%。此外，在字段级生成设置中，RecipeGen的MRR@3和BLEU得分分别达到0.591和0.575。

{"title":"Accurate Generation of Trigger-Action Programs with Domain-Adapted Sequence-to-Sequence Learning","authors":"Imam Nur Bani Yusuf, Lingxiao Jiang, David Lo","doi":"10.1145/3524610.3527922","DOIUrl":"https://doi.org/10.1145/3524610.3527922","url":null,"abstract":"Trigger-action programming allows end users to write event-driven rules to automate smart devices and internet services. Users can create a trigger-action program (TAP) by specifying triggers and actions from a set of predefined functions along with suitable data fields for the functions. Many trigger-action programming platforms have emerged as the popularity grows, e.g., IFTTT, Microsoft Power Automate, and Samsung SmartThings. Despite their simplicity, composing trigger-action programs (TAPs) can still be challenging for end users due to the domain knowledge needed and enormous search space of many combinations of triggers and actions. We propose RecipeGen, a new deep learning-based approach that leverages Transformer sequence-to-sequence (seq2seq) architecture to generate TAPs on the fine-grained field-level granularity from natural language descriptions. Our approach adapts autoencoding pre-trained models to warm-start the encoder in the seq2seq model to boost the generation performance. We have evaluated RecipeGen on real-world datasets from the IFTTT platform against the prior state-of-the-art approach on the TAP generation task. Our empirical evaluation shows that the overall improvement against the prior best results ranges from 9.5%-26.5%. Our results also show that adopting a pre-trained autoencoding model boosts the MRR@3 further by 2.8%-10.8%. Further, in the field-level generation setting, RecipeGen achieves 0.591 and 0.575 in terms of MRR@3 and BLEU scores respectively.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116192368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Algorithm Identification in Programming Assignments 编程作业中的算法辨识

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-05-01 DOI: 10.1145/3524610.3527914

Pranshu Chourasia, Ganesh Ramakrishnan, V. Apte, Suraj Kumar

Current autograders of programming assignments are typically program output based; they fall short in many ways: e.g. they do not carry out subjective evaluations such as code quality, or whether the code has followed any instructor specified constraints; this is still done manually by teaching assistants. In this paper, we tackle a specific aspect of such evaluation: to verify whether a program implements a specific algorithm that the instructor specified. An algorithm, e.g. bubble sort, can be coded in myriad different ways, but a human can always understand the code and spot, say a bubble sort, vs. a selection sort. We develop and compare four approaches to do precisely this: given the source code of a program known to implement a certain functionality, identify the algorithm used, among a known set of algorithms. The approaches are based on code similarity, Support Vector Machine (SVM) with tree or graph kernels, and transformer neural architectures based only source code (CodeBERT), and the extension of this that includes code structure (GraphCodeBERT). Furthermore, we use a model for explainability (LIME) to generate insights into why certain programs get certain labels. Results based on our datasets of sorting, searching and shortest path codes, show that GraphCodeBERT, fine-tuned with scrambled source code, i.e., where identifiers are replaced consistently with arbitrary words, gives the best performance in algorithm identification, with accuracy of 96-99% depending on the functionality. Additionally, we add uncalled function source code elimination to our pre-processing pipeline of test programs, to improve the accuracy of classification of obfuscated source code.

目前的自动分级程序作业通常是基于程序输出;它们在很多方面都有不足之处:例如，它们不进行主观评估，比如代码质量，或者代码是否遵循了任何指导者指定的约束;这仍然是由助教手工完成的。在本文中，我们处理这种评估的一个具体方面:验证一个程序是否实现了讲师指定的特定算法。一个算法，例如冒泡排序，可以用无数种不同的方式编码，但人类总是可以理解代码并发现，比如冒泡排序和选择排序。我们开发并比较了四种方法来精确地做到这一点:给定已知实现某种功能的程序的源代码，在已知的一组算法中确定所使用的算法。这些方法基于代码相似度、具有树或图核的支持向量机(SVM)，以及仅基于源代码(CodeBERT)的转换神经体系结构，并对其进行扩展，包括代码结构(GraphCodeBERT)。此外，我们使用可解释性模型(LIME)来深入了解为什么某些程序获得某些标签。基于我们的排序、搜索和最短路径代码数据集的结果表明，GraphCodeBERT对打乱的源代码进行了微调，即标识符被任意单词一致地替换，在算法识别中给出了最佳性能，根据功能的不同，准确率为96-99%。此外，我们在测试程序的预处理管道中增加了未调用函数源代码消除，以提高混淆源代码分类的准确性。

{"title":"Algorithm Identification in Programming Assignments","authors":"Pranshu Chourasia, Ganesh Ramakrishnan, V. Apte, Suraj Kumar","doi":"10.1145/3524610.3527914","DOIUrl":"https://doi.org/10.1145/3524610.3527914","url":null,"abstract":"Current autograders of programming assignments are typically program output based; they fall short in many ways: e.g. they do not carry out subjective evaluations such as code quality, or whether the code has followed any instructor specified constraints; this is still done manually by teaching assistants. In this paper, we tackle a specific aspect of such evaluation: to verify whether a program implements a specific algorithm that the instructor specified. An algorithm, e.g. bubble sort, can be coded in myriad different ways, but a human can always understand the code and spot, say a bubble sort, vs. a selection sort. We develop and compare four approaches to do precisely this: given the source code of a program known to implement a certain functionality, identify the algorithm used, among a known set of algorithms. The approaches are based on code similarity, Support Vector Machine (SVM) with tree or graph kernels, and transformer neural architectures based only source code (CodeBERT), and the extension of this that includes code structure (GraphCodeBERT). Furthermore, we use a model for explainability (LIME) to generate insights into why certain programs get certain labels. Results based on our datasets of sorting, searching and shortest path codes, show that GraphCodeBERT, fine-tuned with scrambled source code, i.e., where identifiers are replaced consistently with arbitrary words, gives the best performance in algorithm identification, with accuracy of 96-99% depending on the functionality. Additionally, we add uncalled function source code elimination to our pre-processing pipeline of test programs, to improve the accuracy of classification of obfuscated source code.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126075090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Benchmarking Library Recognition in Tweets 推文中的基准库识别

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-05-01 DOI: 10.1145/3524610.3527916

Ting Zhang, Divyadharshini Chandrasekaran, Ferdian Thung, David Lo

Software developers often use social media (such as Twitter) to share programming knowledge such as new tools, sample code snippets, and tips on programming. One of the topics they talk about is the software library. The tweets may contain useful information about a library. A good understanding of this information, e.g., on the developer's views regarding a library can be beneficial to weigh the pros and cons of using the library as well as the general sentiments towards the library. However, it is not trivial to recognize whether a word actually refers to a library or other meanings. For example, a tweet mentioning the word “pandas” may refer to the Python pandas library or to the animal. In this work, we created the first benchmark dataset and investigated the task to distinguish whether a tweet refers to a programming library or something else. Recently, the pre-trained Transformer models (PTMs) have achieved great success in the fields of natural language processing and computer vision. Therefore, we extensively evaluated a broad set of modern PTMs, including both general-purpose and domain-specific ones, to solve this programming library recognition task in tweets. Experimental results show that the use of PTM can outperform the best-performing baseline methods by 5% - 12% in terms of F1-score under within-, cross-, and mixed-library settings.

软件开发人员经常使用社交媒体(如Twitter)来分享编程知识，如新工具、示例代码片段和编程技巧。他们谈论的话题之一是软件库。这些推文可能包含有关图书馆的有用信息。很好地理解这些信息，例如开发人员对库的看法，有助于权衡使用库的利弊，以及对库的普遍看法。然而，识别一个单词实际上是指库还是指其他含义并非易事。例如，一条提到“pandas”这个词的推文可能指的是Python熊猫库或这种动物。在这项工作中，我们创建了第一个基准数据集，并研究了区分tweet是指编程库还是其他东西的任务。近年来，预训练的变形模型(PTMs)在自然语言处理和计算机视觉领域取得了巨大的成功。因此，我们广泛地评估了一组广泛的现代ptm，包括通用的和特定于领域的ptm，以解决tweet中的编程库识别任务。实验结果表明，在单库、交叉库和混合库设置下，使用PTM的f1分数比性能最好的基线方法高5% - 12%。

{"title":"Benchmarking Library Recognition in Tweets","authors":"Ting Zhang, Divyadharshini Chandrasekaran, Ferdian Thung, David Lo","doi":"10.1145/3524610.3527916","DOIUrl":"https://doi.org/10.1145/3524610.3527916","url":null,"abstract":"Software developers often use social media (such as Twitter) to share programming knowledge such as new tools, sample code snippets, and tips on programming. One of the topics they talk about is the software library. The tweets may contain useful information about a library. A good understanding of this information, e.g., on the developer's views regarding a library can be beneficial to weigh the pros and cons of using the library as well as the general sentiments towards the library. However, it is not trivial to recognize whether a word actually refers to a library or other meanings. For example, a tweet mentioning the word “pandas” may refer to the Python pandas library or to the animal. In this work, we created the first benchmark dataset and investigated the task to distinguish whether a tweet refers to a programming library or something else. Recently, the pre-trained Transformer models (PTMs) have achieved great success in the fields of natural language processing and computer vision. Therefore, we extensively evaluated a broad set of modern PTMs, including both general-purpose and domain-specific ones, to solve this programming library recognition task in tweets. Experimental results show that the use of PTM can outperform the best-performing baseline methods by 5% - 12% in terms of F1-score under within-, cross-, and mixed-library settings.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128363290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6