{"title":"RFMC-CS: a representation fusion based multi-view momentum contrastive learning framework for code search","authors":"Gong Chen, Wenjie Liu, Xiaoyuan Xie","doi":"10.1007/s10515-025-00487-8","DOIUrl":null,"url":null,"abstract":"<div><p>Code search is a crucial task in software engineering, aiming to search relevant code from the codebase based on natural language queries. While deep-learning-based code search methods have demonstrated impressive performance, recent advances in contrastive learning have further enhanced the representation learning of these models. Despite these improvements, existing methods still have limitations in the representation learning of multi-modal data. Specifically, these methods suffer from a semantic loss in the representation learning of code and fail to explore functionally relevant code pairs in the representation learning fully. To address these limitations, we propose <i>A</i> <i><u>R</u></i><i>epresentation</i> <i><u>F</u></i><i>usion based</i> <i><u>M</u></i><i>ulti-View Momentum</i> <i><u>C</u></i><i>ontrastive Learning Framework for</i> <i><u>C</u></i><i>ode</i> <i><u>S</u></i><i>earch</i>, <i>named RFMC-CS</i>. <i>RFMC-CS</i> effectively retains the semantic and structural information of code through multi-modal representation and fusion. Through elaborately designed Multi-View Momentum Contrastive Learning, <i>RFMC-CS</i> can further learn the correlations between different modalities of samples and semantic relevant samples. The experimental results on the CodeSearchNet benchmark show that <i>RFMC-CS</i> outperforms seven advanced baselines on MRR and Recall@k metrics. The ablation experiments illustrate the effectiveness of each component. The portability experiments show that <i>RFMC-CS</i> has good portability.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-025-00487-8","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Code search is a crucial task in software engineering, aiming to search relevant code from the codebase based on natural language queries. While deep-learning-based code search methods have demonstrated impressive performance, recent advances in contrastive learning have further enhanced the representation learning of these models. Despite these improvements, existing methods still have limitations in the representation learning of multi-modal data. Specifically, these methods suffer from a semantic loss in the representation learning of code and fail to explore functionally relevant code pairs in the representation learning fully. To address these limitations, we propose ARepresentationFusion basedMulti-View MomentumContrastive Learning Framework forCodeSearch, named RFMC-CS. RFMC-CS effectively retains the semantic and structural information of code through multi-modal representation and fusion. Through elaborately designed Multi-View Momentum Contrastive Learning, RFMC-CS can further learn the correlations between different modalities of samples and semantic relevant samples. The experimental results on the CodeSearchNet benchmark show that RFMC-CS outperforms seven advanced baselines on MRR and Recall@k metrics. The ablation experiments illustrate the effectiveness of each component. The portability experiments show that RFMC-CS has good portability.
期刊介绍:
This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes.
Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.