CORES: COde REpresentation Summarization for Code Search

IF 4.3 2区 计算机科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Consumer Electronics Pub Date : 2024-08-16 DOI:10.1109/TCE.2024.3445139
Xu Zhang;Xiaoyu Hu;Deyu Zhou
{"title":"CORES: COde REpresentation Summarization for Code Search","authors":"Xu Zhang;Xiaoyu Hu;Deyu Zhou","doi":"10.1109/TCE.2024.3445139","DOIUrl":null,"url":null,"abstract":"With the growth of the consumer electronics market, the software development industry is facing new opportunities and an increased focus on code retrieval techniques to improve efficiency and reduce costs. Code search aims to retrieve and reuse code from extensive repositories based on a search query with specific requirements. Recently, pre-trained model-based approaches have become popular because of grasping semantic representations of code snippets and search queries accurately. However, such approaches ignore the inconsistency between code and query statements due to the redundant tokens, such as definitions and punctuation marks in the code snippets, which hinder the matching accuracy. To tackle such disadvantage, in this paper, two strategies are proposed based on explicit or implicit code representation summarization. By summarizing the code representation, the redundancy in the code is removed and the inconsistency between code and query statements is alleviated. For the explicit code representation summarization-based strategy, different views of contextual information are obtained and summarized based on different scales of pyramidal dilated convolution. As to the implicit code representation summarization-based strategy, covariance is directly applied to constrain the code representation to ensure de-redundancy. Experimental results on six benchmark datasets show both strategies outperform the current State-Of-The-Art model CORES by 1.2% on average MRR scores.","PeriodicalId":13208,"journal":{"name":"IEEE Transactions on Consumer Electronics","volume":"70 3","pages":"6095-6104"},"PeriodicalIF":4.3000,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Consumer Electronics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10638144/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

With the growth of the consumer electronics market, the software development industry is facing new opportunities and an increased focus on code retrieval techniques to improve efficiency and reduce costs. Code search aims to retrieve and reuse code from extensive repositories based on a search query with specific requirements. Recently, pre-trained model-based approaches have become popular because of grasping semantic representations of code snippets and search queries accurately. However, such approaches ignore the inconsistency between code and query statements due to the redundant tokens, such as definitions and punctuation marks in the code snippets, which hinder the matching accuracy. To tackle such disadvantage, in this paper, two strategies are proposed based on explicit or implicit code representation summarization. By summarizing the code representation, the redundancy in the code is removed and the inconsistency between code and query statements is alleviated. For the explicit code representation summarization-based strategy, different views of contextual information are obtained and summarized based on different scales of pyramidal dilated convolution. As to the implicit code representation summarization-based strategy, covariance is directly applied to constrain the code representation to ensure de-redundancy. Experimental results on six benchmark datasets show both strategies outperform the current State-Of-The-Art model CORES by 1.2% on average MRR scores.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CORES:用于代码搜索的 COde REpresentation 总结
随着消费电子市场的增长,软件开发行业面临着新的机遇,并且越来越关注代码检索技术,以提高效率和降低成本。代码搜索的目的是基于特定需求的搜索查询从广泛的存储库中检索和重用代码。最近,基于预训练模型的方法变得流行起来,因为它可以准确地掌握代码片段和搜索查询的语义表示。然而,这种方法忽略了代码和查询语句之间的不一致性,因为代码片段中的冗余标记(如定义和标点符号)会影响匹配的准确性。为了解决这一问题,本文提出了两种基于显式和隐式代码表示总结的策略。通过对代码表示进行总结,消除了代码中的冗余,减轻了代码与查询语句之间的不一致。对于基于显式代码表示摘要的策略,基于不同的锥体扩张卷积尺度,获得了不同的上下文信息视图并进行了总结。基于隐式代码表示摘要的策略直接利用协方差对代码表示进行约束,以保证去冗余。在六个基准数据集上的实验结果表明,这两种策略的平均MRR分数都比目前最先进的模型内核高出1.2%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
7.70
自引率
9.30%
发文量
59
审稿时长
3.3 months
期刊介绍: The main focus for the IEEE Transactions on Consumer Electronics is the engineering and research aspects of the theory, design, construction, manufacture or end use of mass market electronics, systems, software and services for consumers.
期刊最新文献
Table of Contents Guest Editorial Consumer-Driven Energy-Efficient WSNs Architecture for Personalization and Contextualization in E-Commerce Systems IEEE Consumer Technology Society Officers and Committee Chairs Energy-Efficient Secure Architecture For Personalization E-Commerce WSN IEEE Consumer Technology Society
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1