{"title":"Combining Global and Local Representations of Source Code for Method Naming","authors":"Cong Zhou, Li Kuang","doi":"10.1109/ICECCS54210.2022.00026","DOIUrl":null,"url":null,"abstract":"Code is a kind of complex data. Recent models learn code representation using global or local aggregation. Global encoding allows all tokens of code to be connected directly and neglects the graph structure. Local encoding focuses on the neighbor nodes when capturing the graph structure but fails to capture long dependencies. In this work, we gather both encoding strategies and investigate different models that combine both global and local representations of code in order to learn code representation better. Specifically, we modify the layer structure based on the sequence-to-sequence model to incorporate a structured model in the encoder and decoder parts, respectively. To further consider different integration ways, we propose four models for method naming. In an extensive evaluation, we demonstrate that our models have a significant improvement on a well-studied dataset of method naming, achieving ROUGE-1 score of 54.1, ROUGE-2 score of 26.7, and ROUGE-L score of 54.3, outperforming state-of-the-art models by 2.7, 1.7, and 4.3 points, respectively. Our data and code are available at https://github.com/zc-work/CGLNaming.","PeriodicalId":344493,"journal":{"name":"2022 26th International Conference on Engineering of Complex Computer Systems (ICECCS)","volume":"24 54","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 26th International Conference on Engineering of Complex Computer Systems (ICECCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECCS54210.2022.00026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Code is a kind of complex data. Recent models learn code representation using global or local aggregation. Global encoding allows all tokens of code to be connected directly and neglects the graph structure. Local encoding focuses on the neighbor nodes when capturing the graph structure but fails to capture long dependencies. In this work, we gather both encoding strategies and investigate different models that combine both global and local representations of code in order to learn code representation better. Specifically, we modify the layer structure based on the sequence-to-sequence model to incorporate a structured model in the encoder and decoder parts, respectively. To further consider different integration ways, we propose four models for method naming. In an extensive evaluation, we demonstrate that our models have a significant improvement on a well-studied dataset of method naming, achieving ROUGE-1 score of 54.1, ROUGE-2 score of 26.7, and ROUGE-L score of 54.3, outperforming state-of-the-art models by 2.7, 1.7, and 4.3 points, respectively. Our data and code are available at https://github.com/zc-work/CGLNaming.