MMF3: Neural Code Summarization Based on Multi-Modal Fine-Grained Feature Fusion

Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement Pub Date : 2022-09-19 DOI:10.1145/3544902.3546251

Zheng Ma, Yuexiu Gao, Lei Lyu, Chen Lyu

{"title":"MMF3: Neural Code Summarization Based on Multi-Modal Fine-Grained Feature Fusion","authors":"Zheng Ma, Yuexiu Gao, Lei Lyu, Chen Lyu","doi":"10.1145/3544902.3546251","DOIUrl":null,"url":null,"abstract":"Background: Code summarization automatically generates the corresponding natural language descriptions according to the input code to characterize the function implemented by source code. Comprehensiveness of code representation is critical to code summarization task. However, most existing approaches typically use coarse-grained fusion methods to integrate multi-modal features. They generally represent different modalities of a piece of code, such as an Abstract Syntax Tree (AST) and a token sequence, as two embeddings and then fuse the two ones at the AST/code levels. Such a coarse integration makes it difficult to learn the correlations between fine-grained code elements across modalities effectively. Aims: This study intends to improve the model’s prediction performance for high-quality code summarization by accurately aligning and fully fusing semantic and syntactic structure information of source code at node/token levels. Method: This paper proposes a Multi-Modal Fine-grained Feature Fusion approach (MMF3) for neural code summarization. The method uses the Transformer architecture. In particular, we introduce a novel fine-grained fusion method, which allows fine-grained fusion of multiple code modalities at the token and node levels. Specifically, we use this method to fuse information from both token and AST modalities and apply the fused features to code summarization. Results: We conduct experiments on one Java and one Python datasets, and evaluate generated summaries using four metrics. The results show that: 1) the performance of our model outperforms the current state-of-the-art models, and 2) the ablation experiments show that our proposed fine-grained fusion method can effectively improve the accuracy of generated summaries. Conclusion: MMF3 can mine the relationships between cross-modal elements and perform accurate fine-grained element-level alignment fusion accordingly. As a result, more clues can be provided to improve the accuracy of the generated code summaries.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3544902.3546251","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Background: Code summarization automatically generates the corresponding natural language descriptions according to the input code to characterize the function implemented by source code. Comprehensiveness of code representation is critical to code summarization task. However, most existing approaches typically use coarse-grained fusion methods to integrate multi-modal features. They generally represent different modalities of a piece of code, such as an Abstract Syntax Tree (AST) and a token sequence, as two embeddings and then fuse the two ones at the AST/code levels. Such a coarse integration makes it difficult to learn the correlations between fine-grained code elements across modalities effectively. Aims: This study intends to improve the model’s prediction performance for high-quality code summarization by accurately aligning and fully fusing semantic and syntactic structure information of source code at node/token levels. Method: This paper proposes a Multi-Modal Fine-grained Feature Fusion approach (MMF3) for neural code summarization. The method uses the Transformer architecture. In particular, we introduce a novel fine-grained fusion method, which allows fine-grained fusion of multiple code modalities at the token and node levels. Specifically, we use this method to fuse information from both token and AST modalities and apply the fused features to code summarization. Results: We conduct experiments on one Java and one Python datasets, and evaluate generated summaries using four metrics. The results show that: 1) the performance of our model outperforms the current state-of-the-art models, and 2) the ablation experiments show that our proposed fine-grained fusion method can effectively improve the accuracy of generated summaries. Conclusion: MMF3 can mine the relationships between cross-modal elements and perform accurate fine-grained element-level alignment fusion accordingly. As a result, more clues can be provided to improve the accuracy of the generated code summaries.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于多模态细粒度特征融合的神经代码摘要

背景:代码摘要根据输入的代码自动生成相应的自然语言描述，对源代码实现的功能进行表征。代码表示的全面性是完成代码摘要任务的关键。然而，大多数现有方法通常使用粗粒度融合方法来集成多模态特征。它们通常表示一段代码的不同形式，例如抽象语法树(AST)和标记序列，作为两个嵌入，然后在AST/代码级别融合这两个嵌入。这种粗糙的集成使得很难有效地学习跨模式的细粒度代码元素之间的相关性。目的:本研究旨在通过在节点/令牌级别准确对齐和充分融合源代码的语义和句法结构信息，提高模型对高质量代码摘要的预测性能。方法:提出一种多模态细粒度特征融合方法(MMF3)用于神经编码摘要。该方法使用Transformer体系结构。特别是，我们引入了一种新的细粒度融合方法，该方法允许在令牌和节点级别对多个代码模式进行细粒度融合。具体来说，我们使用这种方法来融合来自令牌和AST模式的信息，并将融合的特征应用于代码摘要。结果:我们在一个Java和一个Python数据集上进行实验，并使用四个指标评估生成的摘要。结果表明:1)我们的模型性能优于目前最先进的模型;2)烧蚀实验表明，我们提出的细粒度融合方法可以有效提高生成摘要的准确性。结论:MMF3可以挖掘跨模态元素之间的关系，并相应地进行精确的细粒度元素级对齐融合。因此，可以提供更多的线索来提高生成的代码摘要的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement

自引率

0.00%

发文量