Amine Abbad Andaloussi, Thierry Sorg, Barbara Weber
The comprehension of source code is a task inherent to many software development activities. Code change, code review and debugging are examples of these activities that depend heavily on developers' understanding of the source code. This ability is threatened when developers' cognitive load approaches the limits of their working memory, which in turn affects their understanding and makes them more prone to errors. Measures capturing humans' behavior and changes in their physiological state have been proposed in a number of studies to investigate developers' cognitive load. However, the majority of the existing approaches operate at a coarse-grained task level estimating the difficulty of the source code as a whole. Hence, they cannot be used to pinpoint the mentally demanding parts of it. We address this limitation in this paper through a non-intrusive approach based on eye-tracking. We collect users' behavioral and physiological features while they are engaging with source code and train a set of machine learning models to estimate the mentally demanding parts of code. The evaluation of our models returns F1, recall, accuracy and precision scores up to 85.65%, 84.25%, 86.24% and 88.61%, respectively, when estimating the mental demanding fragments of code. Our approach enables a fine-grained analysis of cognitive load and allows identifying the parts challenging the comprehension of source code. Such an approach provides the means to test new hypotheses addressing the characteristics of specific parts within the source code and paves the road for novel techniques for code review and adaptive e-learning.
{"title":"Estimating Developers' Cognitive Load at a Fine-grained Level Using Eye-tracking Measures","authors":"Amine Abbad Andaloussi, Thierry Sorg, Barbara Weber","doi":"10.1145/3524610.3527890","DOIUrl":"https://doi.org/10.1145/3524610.3527890","url":null,"abstract":"The comprehension of source code is a task inherent to many software development activities. Code change, code review and debugging are examples of these activities that depend heavily on developers' understanding of the source code. This ability is threatened when developers' cognitive load approaches the limits of their working memory, which in turn affects their understanding and makes them more prone to errors. Measures capturing humans' behavior and changes in their physiological state have been proposed in a number of studies to investigate developers' cognitive load. However, the majority of the existing approaches operate at a coarse-grained task level estimating the difficulty of the source code as a whole. Hence, they cannot be used to pinpoint the mentally demanding parts of it. We address this limitation in this paper through a non-intrusive approach based on eye-tracking. We collect users' behavioral and physiological features while they are engaging with source code and train a set of machine learning models to estimate the mentally demanding parts of code. The evaluation of our models returns F1, recall, accuracy and precision scores up to 85.65%, 84.25%, 86.24% and 88.61%, respectively, when estimating the mental demanding fragments of code. Our approach enables a fine-grained analysis of cognitive load and allows identifying the parts challenging the comprehension of source code. Such an approach provides the means to test new hypotheses addressing the characteristics of specific parts within the source code and paves the road for novel techniques for code review and adaptive e-learning.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124131038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Online code clones occur due to reusing code snippets in software repositories from online resources such as GitHub and Stack Overflow. Previous works have shown that snippets from Stack Overflow are reused in other open-source projects and vice versa. Analysis of online code reusing patterns could identify outdated code, understand developers' practices, and help to design new code search engines. This study analyzed JavaScript online code clones between Stack Overflow and GitHub repositories. We first developed a JavaScript code corpus to search online clones. The clone search results reported 12,579 online clones between 276,547 non-trivial syntactically validated Stack Overflow snippets and 292 GitHub repositories. We manually classified the top 10% (1257) pairs of clones in seven online clone patterns. We observed that around 70% of JavaScript snippets in Stack Overflow posts are copied from GitHub repositories or from other external sources. Moreover, only 30.59% of JavaScript Snippets in Stack Overflow accepted answers could be considered as reusable snippets.
{"title":"An Exploratory Study of Analyzing JavaScript Online Code Clones","authors":"Md Rakib Hossain Misu, A. Satter","doi":"10.1145/3524610.3528390","DOIUrl":"https://doi.org/10.1145/3524610.3528390","url":null,"abstract":"Online code clones occur due to reusing code snippets in software repositories from online resources such as GitHub and Stack Overflow. Previous works have shown that snippets from Stack Overflow are reused in other open-source projects and vice versa. Analysis of online code reusing patterns could identify outdated code, understand developers' practices, and help to design new code search engines. This study analyzed JavaScript online code clones between Stack Overflow and GitHub repositories. We first developed a JavaScript code corpus to search online clones. The clone search results reported 12,579 online clones between 276,547 non-trivial syntactically validated Stack Overflow snippets and 292 GitHub repositories. We manually classified the top 10% (1257) pairs of clones in seven online clone patterns. We observed that around 70% of JavaScript snippets in Stack Overflow posts are copied from GitHub repositories or from other external sources. Moreover, only 30.59% of JavaScript Snippets in Stack Overflow accepted answers could be considered as reusable snippets.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"394 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116020746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Code comment, i.e., the natural language text to describe code, is considered as a killer for program comprehension. Current literature approaches mainly focus on comment generation or comment update, and thus fall short on explaining which part of the code leads to a specific content in the comment. In this paper, we propose that addressing such a challenge can better facilitate code under-standing. We propose Fosterer, which can build fine-grained se-mantic interactions between code statements and comment tokens. It not only leverages the advanced deep learning techniques like cross-modal learning and contrastive learning, but also borrows the weapon of pre-trained vision models. Specifically, it mimics the comprehension practice of developers, treating code statements as image patches and comments as texts, and uses contrastive learning to match the semantically-related part between the visual and tex-tual information. Experiments on a large-scale manually-labelled dataset show that our approach can achieve an Fl-score around 80%, and such a performance exceeds a heuristic-based baseline to a large extent. We also find that Fosterer can work with a high efficiency, i.e., it only needs 1.5 seconds for inferring the results for a code-comment pair. Furthermore, a user study demonstrates its usability: for 65% cases, its prediction results are considered as useful for improving code understanding. Therefore, our research sheds light on a promising direction for program comprehension.
{"title":"Fine-Grained Code-Comment Semantic Interaction Analysis","authors":"Mingyang Geng, Shangwen Wang, Dezun Dong, Shanzhi Gu, Fang Peng, Weijian Ruan, Xiangke Liao","doi":"10.1145/3524610.3527887","DOIUrl":"https://doi.org/10.1145/3524610.3527887","url":null,"abstract":"Code comment, i.e., the natural language text to describe code, is considered as a killer for program comprehension. Current literature approaches mainly focus on comment generation or comment update, and thus fall short on explaining which part of the code leads to a specific content in the comment. In this paper, we propose that addressing such a challenge can better facilitate code under-standing. We propose Fosterer, which can build fine-grained se-mantic interactions between code statements and comment tokens. It not only leverages the advanced deep learning techniques like cross-modal learning and contrastive learning, but also borrows the weapon of pre-trained vision models. Specifically, it mimics the comprehension practice of developers, treating code statements as image patches and comments as texts, and uses contrastive learning to match the semantically-related part between the visual and tex-tual information. Experiments on a large-scale manually-labelled dataset show that our approach can achieve an Fl-score around 80%, and such a performance exceeds a heuristic-based baseline to a large extent. We also find that Fosterer can work with a high efficiency, i.e., it only needs 1.5 seconds for inferring the results for a code-comment pair. Furthermore, a user study demonstrates its usability: for 65% cases, its prediction results are considered as useful for improving code understanding. Therefore, our research sheds light on a promising direction for program comprehension.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114799313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Static bug finders (also known as static code analyzers, e.g., Find-Bugs, SonarQube) have been widely-adopted by developers to find bugs in real-world software projects. They leverage predefined heuristic static analysis rules to scan source code or binary code of a software project, and report violations to these rules as warnings to be verified. However, the advantages of static bug finders are overshadowed by such issues as uncovered obvious bugs, false positives, etc. To improve these tools, many techniques have been proposed to filter out false positives reported or design new static analysis rules. Nevertheless, the under-performance of bug finders can also be caused by the incorrectness of current rules contained in the static bug finders, which is not explored yet. In this work, we propose a differential testing approach to detect bugs in the rules of four widely-used static bug finders, i.e., SonarQube, PMD, SpotBugs, and ErrorProne, and conduct a qualitative study about the bugs found. The experiment on 2,728 open source projects reveals 46 bugs in the static bug finders, among which 30 are fixed or confirmed and the left are awaiting confirmation. We also summarize 13 bug patterns in the static analysis rules based on their context and root causes, which can serve as the checklist for designing and implementing other rules and/or in other tools. This study indicates that the commonly-used static bug finders are not as reliable as they might have been envisaged. It not only demonstrates the effectiveness of our approach, but also highlights the need to continue improving the reliability of the static bug finders.
{"title":"Find Bugs in Static Bug Finders","authors":"Junjie Wang, Yuchao Huang, Song Wang, Qing Wang","doi":"10.1145/3524610.3527899","DOIUrl":"https://doi.org/10.1145/3524610.3527899","url":null,"abstract":"Static bug finders (also known as static code analyzers, e.g., Find-Bugs, SonarQube) have been widely-adopted by developers to find bugs in real-world software projects. They leverage predefined heuristic static analysis rules to scan source code or binary code of a software project, and report violations to these rules as warnings to be verified. However, the advantages of static bug finders are overshadowed by such issues as uncovered obvious bugs, false positives, etc. To improve these tools, many techniques have been proposed to filter out false positives reported or design new static analysis rules. Nevertheless, the under-performance of bug finders can also be caused by the incorrectness of current rules contained in the static bug finders, which is not explored yet. In this work, we propose a differential testing approach to detect bugs in the rules of four widely-used static bug finders, i.e., SonarQube, PMD, SpotBugs, and ErrorProne, and conduct a qualitative study about the bugs found. The experiment on 2,728 open source projects reveals 46 bugs in the static bug finders, among which 30 are fixed or confirmed and the left are awaiting confirmation. We also summarize 13 bug patterns in the static analysis rules based on their context and root causes, which can serve as the checklist for designing and implementing other rules and/or in other tools. This study indicates that the commonly-used static bug finders are not as reliable as they might have been envisaged. It not only demonstrates the effectiveness of our approach, but also highlights the need to continue improving the reliability of the static bug finders.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114975791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
During software development, developers introduce code clones by reusing existing code to improve programming productivity. Considering the detrimental effects on software maintenance and evolution, many techniques are proposed to detect code clones. Existing approaches are mainly used to detect clones written in the same programming language. However, it is common to develop programs with the same functionality but in different programming languages to support various platforms. In this paper, we propose a new approach named C4, referring to Contrastive Cross-language Code Clone detection model. It can detect cross-language clones with learned representations effectively. C4 exploits the pre-trained model CodeBERT to convert programs in different languages into high-dimensional vector representations. In addition, we fine tune the C4 model through a constrastive learning objective that can effectively recognize clone pairs and non-clone pairs. To evaluate the effectiveness of our approach, we conduct extensive experiments on the dataset proposed by CLCDSA. Experimental results show that C4 achieves scores of 0.94, 0.90, and 0.92 in terms of precision, recall and F-measure and substantially outperforms the state-of-the-art baselines.
{"title":"C4: Contrastive Cross-Language Code Clone Detection","authors":"Chenning Tao, Qi Zhan, Xing Hu, Xin Xia","doi":"10.1145/3524610.3527911","DOIUrl":"https://doi.org/10.1145/3524610.3527911","url":null,"abstract":"During software development, developers introduce code clones by reusing existing code to improve programming productivity. Considering the detrimental effects on software maintenance and evolution, many techniques are proposed to detect code clones. Existing approaches are mainly used to detect clones written in the same programming language. However, it is common to develop programs with the same functionality but in different programming languages to support various platforms. In this paper, we propose a new approach named C4, referring to Contrastive Cross-language Code Clone detection model. It can detect cross-language clones with learned representations effectively. C4 exploits the pre-trained model CodeBERT to convert programs in different languages into high-dimensional vector representations. In addition, we fine tune the C4 model through a constrastive learning objective that can effectively recognize clone pairs and non-clone pairs. To evaluate the effectiveness of our approach, we conduct extensive experiments on the dataset proposed by CLCDSA. Experimental results show that C4 achieves scores of 0.94, 0.90, and 0.92 in terms of precision, recall and F-measure and substantially outperforms the state-of-the-art baselines.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124965511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. A. Haryono, Hong Jin Kang, Abhishek Sharma, Asankhaya Sharma, A. Santosa, Angela Yi, D. Lo
Software engineers depend heavily on software libraries and have to update their dependencies once vulnerabilities are found in them. Software Composition Analysis (SCA) helps developers identify vulnerable libraries used by an application. A key challenge is the identification of libraries related to a given reported vulnerability in the National Vulnerability Database (NVD), which may not ex-plicitly indicate the affected libraries. Recently, researchers have tried to address the problem of identifying the libraries from an NVD report by treating it as an extreme multi-label learning (XML) problem, characterized by its large number of possible labels and severe data sparsity. As input, the NVD report is provided, and as output, a set of relevant libraries is returned. In this work, we evaluated multiple XML techniques. While pre-vious work only evaluated a traditional XML technique, FastXML, we trained four other traditional XML models (DiSMEC, Parabel, Bonsai, ExtremeText) as well as two deep learning-based models (XML-CNN and LightXML). We compared both their effectiveness and the time cost of training and using the models for predictions. We find that other than DiSMEC and XML-CNN, recent XML mod-els outperform the FastXML model by 3%-10% in terms of F1-scores on Top-k (k=1,2,3) predictions. Furthermore, we observe significant improvements in both the training and prediction time of these XML models, with Bonsai and Parabel model achieving 627x and 589x faster training time and 12x faster prediction time from the FastXML baseline. We discuss the implications of our experimental results and highlight limitations for future work to address.
{"title":"Automated Identification of Libraries from Vulnerability Data: Can We Do Better?","authors":"S. A. Haryono, Hong Jin Kang, Abhishek Sharma, Asankhaya Sharma, A. Santosa, Angela Yi, D. Lo","doi":"10.1145/3524610.3527893","DOIUrl":"https://doi.org/10.1145/3524610.3527893","url":null,"abstract":"Software engineers depend heavily on software libraries and have to update their dependencies once vulnerabilities are found in them. Software Composition Analysis (SCA) helps developers identify vulnerable libraries used by an application. A key challenge is the identification of libraries related to a given reported vulnerability in the National Vulnerability Database (NVD), which may not ex-plicitly indicate the affected libraries. Recently, researchers have tried to address the problem of identifying the libraries from an NVD report by treating it as an extreme multi-label learning (XML) problem, characterized by its large number of possible labels and severe data sparsity. As input, the NVD report is provided, and as output, a set of relevant libraries is returned. In this work, we evaluated multiple XML techniques. While pre-vious work only evaluated a traditional XML technique, FastXML, we trained four other traditional XML models (DiSMEC, Parabel, Bonsai, ExtremeText) as well as two deep learning-based models (XML-CNN and LightXML). We compared both their effectiveness and the time cost of training and using the models for predictions. We find that other than DiSMEC and XML-CNN, recent XML mod-els outperform the FastXML model by 3%-10% in terms of F1-scores on Top-k (k=1,2,3) predictions. Furthermore, we observe significant improvements in both the training and prediction time of these XML models, with Bonsai and Parabel model achieving 627x and 589x faster training time and 12x faster prediction time from the FastXML baseline. We discuss the implications of our experimental results and highlight limitations for future work to address.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129753502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Learning smart contract representations can greatly facilitate the development of smart contracts in many tasks such as bug detection and clone detection. Existing approaches for learning program representations are difficult to apply to smart contracts which have insufficient data and significant homogenization. To overcome these challenges, in this paper, we propose SRCL, a novel, self-supervised approach for learning smart contract representations. Unlike ex-isting supervised methods, which are tied on task-specific data labels, SRCL leverages large-scale unlabeled data by self-supervised learning of both local and global information of smart contracts. It automatically extracts structural sequences from abstract syntax trees (ASTs). Then, two discriminators are designed to guide the Transformer encoder to learn local and global semantic features of smart contracts. We evaluate SRCL on a dataset of 75,006 smart contracts collected from Etherscan. Experimental results show that SRCL considerably outperforms the state-of-the-art code represen-tation models on three downstream tasks.
{"title":"Self-Supervised Learning of Smart Contract Representations","authors":"Shouliang Yang, Xiaodong Gu, Beijun Shen","doi":"10.1145/3524610.3527894","DOIUrl":"https://doi.org/10.1145/3524610.3527894","url":null,"abstract":"Learning smart contract representations can greatly facilitate the development of smart contracts in many tasks such as bug detection and clone detection. Existing approaches for learning program representations are difficult to apply to smart contracts which have insufficient data and significant homogenization. To overcome these challenges, in this paper, we propose SRCL, a novel, self-supervised approach for learning smart contract representations. Unlike ex-isting supervised methods, which are tied on task-specific data labels, SRCL leverages large-scale unlabeled data by self-supervised learning of both local and global information of smart contracts. It automatically extracts structural sequences from abstract syntax trees (ASTs). Then, two discriminators are designed to guide the Transformer encoder to learn local and global semantic features of smart contracts. We evaluate SRCL on a dataset of 75,006 smart contracts collected from Etherscan. Experimental results show that SRCL considerably outperforms the state-of-the-art code represen-tation models on three downstream tasks.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115269608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Trigger-action programming allows end users to write event-driven rules to automate smart devices and internet services. Users can create a trigger-action program (TAP) by specifying triggers and actions from a set of predefined functions along with suitable data fields for the functions. Many trigger-action programming platforms have emerged as the popularity grows, e.g., IFTTT, Microsoft Power Automate, and Samsung SmartThings. Despite their simplicity, composing trigger-action programs (TAPs) can still be challenging for end users due to the domain knowledge needed and enormous search space of many combinations of triggers and actions. We propose RecipeGen, a new deep learning-based approach that leverages Transformer sequence-to-sequence (seq2seq) architecture to generate TAPs on the fine-grained field-level granularity from natural language descriptions. Our approach adapts autoencoding pre-trained models to warm-start the encoder in the seq2seq model to boost the generation performance. We have evaluated RecipeGen on real-world datasets from the IFTTT platform against the prior state-of-the-art approach on the TAP generation task. Our empirical evaluation shows that the overall improvement against the prior best results ranges from 9.5%-26.5%. Our results also show that adopting a pre-trained autoencoding model boosts the MRR@3 further by 2.8%-10.8%. Further, in the field-level generation setting, RecipeGen achieves 0.591 and 0.575 in terms of MRR@3 and BLEU scores respectively.
触发操作编程允许最终用户编写事件驱动的规则,使智能设备和互联网服务自动化。用户可以通过从一组预定义的函数中指定触发器和操作以及函数的合适数据字段来创建触发器-操作程序(TAP)。随着越来越受欢迎,许多触发操作编程平台已经出现,例如IFTTT, Microsoft Power automation和Samsung SmartThings。尽管它们很简单,但由于需要领域知识和许多触发器和操作组合的巨大搜索空间,对最终用户来说,组合触发器-操作程序(tap)仍然是一个挑战。我们提出了RecipeGen,这是一种新的基于深度学习的方法,它利用Transformer序列到序列(seq2seq)架构,从自然语言描述中生成细粒度字段级粒度的tap。我们的方法采用自编码预训练模型来热启动seq2seq模型中的编码器,以提高生成性能。我们在IFTTT平台的真实数据集上对RecipeGen进行了评估,对比了之前最先进的TAP生成任务方法。我们的实证评估表明,与之前的最佳结果相比,总体改进幅度在9.5%-26.5%之间。我们的研究结果还表明,采用预训练的自动编码模型可使MRR@3进一步提高2.8%-10.8%。此外,在字段级生成设置中,RecipeGen的MRR@3和BLEU得分分别达到0.591和0.575。
{"title":"Accurate Generation of Trigger-Action Programs with Domain-Adapted Sequence-to-Sequence Learning","authors":"Imam Nur Bani Yusuf, Lingxiao Jiang, David Lo","doi":"10.1145/3524610.3527922","DOIUrl":"https://doi.org/10.1145/3524610.3527922","url":null,"abstract":"Trigger-action programming allows end users to write event-driven rules to automate smart devices and internet services. Users can create a trigger-action program (TAP) by specifying triggers and actions from a set of predefined functions along with suitable data fields for the functions. Many trigger-action programming platforms have emerged as the popularity grows, e.g., IFTTT, Microsoft Power Automate, and Samsung SmartThings. Despite their simplicity, composing trigger-action programs (TAPs) can still be challenging for end users due to the domain knowledge needed and enormous search space of many combinations of triggers and actions. We propose RecipeGen, a new deep learning-based approach that leverages Transformer sequence-to-sequence (seq2seq) architecture to generate TAPs on the fine-grained field-level granularity from natural language descriptions. Our approach adapts autoencoding pre-trained models to warm-start the encoder in the seq2seq model to boost the generation performance. We have evaluated RecipeGen on real-world datasets from the IFTTT platform against the prior state-of-the-art approach on the TAP generation task. Our empirical evaluation shows that the overall improvement against the prior best results ranges from 9.5%-26.5%. Our results also show that adopting a pre-trained autoencoding model boosts the MRR@3 further by 2.8%-10.8%. Further, in the field-level generation setting, RecipeGen achieves 0.591 and 0.575 in terms of MRR@3 and BLEU scores respectively.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116192368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pranshu Chourasia, Ganesh Ramakrishnan, V. Apte, Suraj Kumar
Current autograders of programming assignments are typically program output based; they fall short in many ways: e.g. they do not carry out subjective evaluations such as code quality, or whether the code has followed any instructor specified constraints; this is still done manually by teaching assistants. In this paper, we tackle a specific aspect of such evaluation: to verify whether a program implements a specific algorithm that the instructor specified. An algorithm, e.g. bubble sort, can be coded in myriad different ways, but a human can always understand the code and spot, say a bubble sort, vs. a selection sort. We develop and compare four approaches to do precisely this: given the source code of a program known to implement a certain functionality, identify the algorithm used, among a known set of algorithms. The approaches are based on code similarity, Support Vector Machine (SVM) with tree or graph kernels, and transformer neural architectures based only source code (CodeBERT), and the extension of this that includes code structure (GraphCodeBERT). Furthermore, we use a model for explainability (LIME) to generate insights into why certain programs get certain labels. Results based on our datasets of sorting, searching and shortest path codes, show that GraphCodeBERT, fine-tuned with scrambled source code, i.e., where identifiers are replaced consistently with arbitrary words, gives the best performance in algorithm identification, with accuracy of 96-99% depending on the functionality. Additionally, we add uncalled function source code elimination to our pre-processing pipeline of test programs, to improve the accuracy of classification of obfuscated source code.
{"title":"Algorithm Identification in Programming Assignments","authors":"Pranshu Chourasia, Ganesh Ramakrishnan, V. Apte, Suraj Kumar","doi":"10.1145/3524610.3527914","DOIUrl":"https://doi.org/10.1145/3524610.3527914","url":null,"abstract":"Current autograders of programming assignments are typically program output based; they fall short in many ways: e.g. they do not carry out subjective evaluations such as code quality, or whether the code has followed any instructor specified constraints; this is still done manually by teaching assistants. In this paper, we tackle a specific aspect of such evaluation: to verify whether a program implements a specific algorithm that the instructor specified. An algorithm, e.g. bubble sort, can be coded in myriad different ways, but a human can always understand the code and spot, say a bubble sort, vs. a selection sort. We develop and compare four approaches to do precisely this: given the source code of a program known to implement a certain functionality, identify the algorithm used, among a known set of algorithms. The approaches are based on code similarity, Support Vector Machine (SVM) with tree or graph kernels, and transformer neural architectures based only source code (CodeBERT), and the extension of this that includes code structure (GraphCodeBERT). Furthermore, we use a model for explainability (LIME) to generate insights into why certain programs get certain labels. Results based on our datasets of sorting, searching and shortest path codes, show that GraphCodeBERT, fine-tuned with scrambled source code, i.e., where identifiers are replaced consistently with arbitrary words, gives the best performance in algorithm identification, with accuracy of 96-99% depending on the functionality. Additionally, we add uncalled function source code elimination to our pre-processing pipeline of test programs, to improve the accuracy of classification of obfuscated source code.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126075090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ting Zhang, Divyadharshini Chandrasekaran, Ferdian Thung, David Lo
Software developers often use social media (such as Twitter) to share programming knowledge such as new tools, sample code snippets, and tips on programming. One of the topics they talk about is the software library. The tweets may contain useful information about a library. A good understanding of this information, e.g., on the developer's views regarding a library can be beneficial to weigh the pros and cons of using the library as well as the general sentiments towards the library. However, it is not trivial to recognize whether a word actually refers to a library or other meanings. For example, a tweet mentioning the word “pandas” may refer to the Python pandas library or to the animal. In this work, we created the first benchmark dataset and investigated the task to distinguish whether a tweet refers to a programming library or something else. Recently, the pre-trained Transformer models (PTMs) have achieved great success in the fields of natural language processing and computer vision. Therefore, we extensively evaluated a broad set of modern PTMs, including both general-purpose and domain-specific ones, to solve this programming library recognition task in tweets. Experimental results show that the use of PTM can outperform the best-performing baseline methods by 5% - 12% in terms of F1-score under within-, cross-, and mixed-library settings.
{"title":"Benchmarking Library Recognition in Tweets","authors":"Ting Zhang, Divyadharshini Chandrasekaran, Ferdian Thung, David Lo","doi":"10.1145/3524610.3527916","DOIUrl":"https://doi.org/10.1145/3524610.3527916","url":null,"abstract":"Software developers often use social media (such as Twitter) to share programming knowledge such as new tools, sample code snippets, and tips on programming. One of the topics they talk about is the software library. The tweets may contain useful information about a library. A good understanding of this information, e.g., on the developer's views regarding a library can be beneficial to weigh the pros and cons of using the library as well as the general sentiments towards the library. However, it is not trivial to recognize whether a word actually refers to a library or other meanings. For example, a tweet mentioning the word “pandas” may refer to the Python pandas library or to the animal. In this work, we created the first benchmark dataset and investigated the task to distinguish whether a tweet refers to a programming library or something else. Recently, the pre-trained Transformer models (PTMs) have achieved great success in the fields of natural language processing and computer vision. Therefore, we extensively evaluated a broad set of modern PTMs, including both general-purpose and domain-specific ones, to solve this programming library recognition task in tweets. Experimental results show that the use of PTM can outperform the best-performing baseline methods by 5% - 12% in terms of F1-score under within-, cross-, and mixed-library settings.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128363290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}