Fine-Grained Code-Comment Semantic Interaction Analysis

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC) Pub Date : 2022-05-01 DOI:10.1145/3524610.3527887

Mingyang Geng, Shangwen Wang, Dezun Dong, Shanzhi Gu, Fang Peng, Weijian Ruan, Xiangke Liao

{"title":"Fine-Grained Code-Comment Semantic Interaction Analysis","authors":"Mingyang Geng, Shangwen Wang, Dezun Dong, Shanzhi Gu, Fang Peng, Weijian Ruan, Xiangke Liao","doi":"10.1145/3524610.3527887","DOIUrl":null,"url":null,"abstract":"Code comment, i.e., the natural language text to describe code, is considered as a killer for program comprehension. Current literature approaches mainly focus on comment generation or comment update, and thus fall short on explaining which part of the code leads to a specific content in the comment. In this paper, we propose that addressing such a challenge can better facilitate code under-standing. We propose Fosterer, which can build fine-grained se-mantic interactions between code statements and comment tokens. It not only leverages the advanced deep learning techniques like cross-modal learning and contrastive learning, but also borrows the weapon of pre-trained vision models. Specifically, it mimics the comprehension practice of developers, treating code statements as image patches and comments as texts, and uses contrastive learning to match the semantically-related part between the visual and tex-tual information. Experiments on a large-scale manually-labelled dataset show that our approach can achieve an Fl-score around 80%, and such a performance exceeds a heuristic-based baseline to a large extent. We also find that Fosterer can work with a high efficiency, i.e., it only needs 1.5 seconds for inferring the results for a code-comment pair. Furthermore, a user study demonstrates its usability: for 65% cases, its prediction results are considered as useful for improving code understanding. Therefore, our research sheds light on a promising direction for program comprehension.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3524610.3527887","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Code comment, i.e., the natural language text to describe code, is considered as a killer for program comprehension. Current literature approaches mainly focus on comment generation or comment update, and thus fall short on explaining which part of the code leads to a specific content in the comment. In this paper, we propose that addressing such a challenge can better facilitate code under-standing. We propose Fosterer, which can build fine-grained se-mantic interactions between code statements and comment tokens. It not only leverages the advanced deep learning techniques like cross-modal learning and contrastive learning, but also borrows the weapon of pre-trained vision models. Specifically, it mimics the comprehension practice of developers, treating code statements as image patches and comments as texts, and uses contrastive learning to match the semantically-related part between the visual and tex-tual information. Experiments on a large-scale manually-labelled dataset show that our approach can achieve an Fl-score around 80%, and such a performance exceeds a heuristic-based baseline to a large extent. We also find that Fosterer can work with a high efficiency, i.e., it only needs 1.5 seconds for inferring the results for a code-comment pair. Furthermore, a user study demonstrates its usability: for 65% cases, its prediction results are considered as useful for improving code understanding. Therefore, our research sheds light on a promising direction for program comprehension.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

细粒度代码-注释语义交互分析

代码注释，即描述代码的自然语言文本，被认为是程序理解的杀手。当前的文献方法主要关注于注释生成或注释更新，因此无法解释代码的哪一部分导致了注释中的特定内容。在本文中，我们提出解决这样的挑战可以更好地促进代码理解。我们建议使用Fosterer，它可以在代码语句和注释令牌之间构建细粒度的语义交互。它不仅利用了跨模态学习和对比学习等先进的深度学习技术，还借用了预训练视觉模型的武器。具体来说，它模仿了开发人员的理解实践，将代码语句视为图像补丁，将注释视为文本，并使用对比学习来匹配视觉和文本信息之间的语义相关部分。在大规模人工标记数据集上的实验表明，我们的方法可以达到80%左右的l-score，这种性能在很大程度上超过了基于启发式的基线。我们还发现，Fosterer的工作效率很高，也就是说，它只需要1.5秒就可以推断出代码注释对的结果。此外，一项用户研究证明了它的可用性:在65%的情况下，它的预测结果被认为对提高代码理解有用。因此，我们的研究为程序理解指明了一个有希望的方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

自引率

0.00%

发文量

期刊最新文献

Context-based Cluster Fault Localization Fine-Grained Code-Comment Semantic Interaction Analysis Find Bugs in Static Bug Finders Self-Supervised Learning of Smart Contract Representations An Exploratory Study of Analyzing JavaScript Online Code Clones