开发人员如何根据上下文调整代码片段？基于上下文的代码片段适应性实证研究

IF 6.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING IEEE Transactions on Software Engineering Pub Date : 2024-04-30 DOI:10.1109/TSE.2024.3395519

Tanghaoran Zhang;Yao Lu;Yue Yu;Xinjun Mao;Yang Zhang;Yuxin Zhao

{"title":"开发人员如何根据上下文调整代码片段？基于上下文的代码片段适应性实证研究","authors":"Tanghaoran Zhang;Yao Lu;Yue Yu;Xinjun Mao;Yang Zhang;Yuxin Zhao","doi":"10.1109/TSE.2024.3395519","DOIUrl":null,"url":null,"abstract":"Reusing code snippets from online programming Q&A communities has become a common development practice, in which developers often need to adapt code snippets to their code contexts to satisfy their own programming needs. However, how developers make these code adaptations based on contexts is still unclear. To bridge this gap, we first conduct a semi-structured interview of 21 developers to investigate their adaptation practices and perceived challenges during this process. The result suggests that code snippet adaptation is a challenging and exhausting task for developers, as they should tailor the snippets to guarantee their correctness and quality with laborious work. We also note that developers all resort to their intra-file context to complete adaptations, which motivates us to further study how developers performed context-based adaptations (CAs) in real scenarios. To this end, we conduct a quantitative study on an adaptation dataset comprising 300 code snippet reuse cases with 1,384 adaptations from Stack Overflow to GitHub. For each adaptation, we manually annotate its intention and relationship with the context. Based on our annotated data, we employ frequent itemset mining to obtain four CA patterns from our dataset, including \n<italic>Fortification\n, \n<italic>Code Wiring\n, \n<italic>Attribute-ization\n and \n<italic>Parameterization\n. Our main findings reveal that: (1) more than half of the code snippet reuse cases include CAs and 23.3% of the adaptations are CAs; (2) more than half of the CAs are corrective adaptations and variable is the primary adapted language construct; (3) attribute is the most frequently utilized context and 88% of the local contexts are within the nearest 10 LOCs; and (4) CAs towards different intentions are repetitive, which are useful for automatic adaptation. Overall, our study provides valuable insights into code snippet adaptation and has important implications for research, practice, and tool design.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 11","pages":"2712-2731"},"PeriodicalIF":6.5000,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"How Do Developers Adapt Code Snippets to Their Contexts? An Empirical Study of Context-Based Code Snippet Adaptations\",\"authors\":\"Tanghaoran Zhang;Yao Lu;Yue Yu;Xinjun Mao;Yang Zhang;Yuxin Zhao\",\"doi\":\"10.1109/TSE.2024.3395519\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reusing code snippets from online programming Q&A communities has become a common development practice, in which developers often need to adapt code snippets to their code contexts to satisfy their own programming needs. However, how developers make these code adaptations based on contexts is still unclear. To bridge this gap, we first conduct a semi-structured interview of 21 developers to investigate their adaptation practices and perceived challenges during this process. The result suggests that code snippet adaptation is a challenging and exhausting task for developers, as they should tailor the snippets to guarantee their correctness and quality with laborious work. We also note that developers all resort to their intra-file context to complete adaptations, which motivates us to further study how developers performed context-based adaptations (CAs) in real scenarios. To this end, we conduct a quantitative study on an adaptation dataset comprising 300 code snippet reuse cases with 1,384 adaptations from Stack Overflow to GitHub. For each adaptation, we manually annotate its intention and relationship with the context. Based on our annotated data, we employ frequent itemset mining to obtain four CA patterns from our dataset, including \\n<italic>Fortification\\n, \\n<italic>Code Wiring\\n, \\n<italic>Attribute-ization\\n and \\n<italic>Parameterization\\n. Our main findings reveal that: (1) more than half of the code snippet reuse cases include CAs and 23.3% of the adaptations are CAs; (2) more than half of the CAs are corrective adaptations and variable is the primary adapted language construct; (3) attribute is the most frequently utilized context and 88% of the local contexts are within the nearest 10 LOCs; and (4) CAs towards different intentions are repetitive, which are useful for automatic adaptation. Overall, our study provides valuable insights into code snippet adaptation and has important implications for research, practice, and tool design.\",\"PeriodicalId\":13324,\"journal\":{\"name\":\"IEEE Transactions on Software Engineering\",\"volume\":\"50 11\",\"pages\":\"2712-2731\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2024-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Software Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10510659/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10510659/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

重复使用在线编程问答社区中的代码片段已成为一种常见的开发实践，在这种实践中，开发人员往往需要根据自己的代码上下文对代码片段进行调整，以满足自己的编程需求。然而，开发人员如何根据上下文进行这些代码调整仍不清楚。为了弥补这一不足，我们首先对 21 名开发人员进行了半结构化访谈，以调查他们在这一过程中的适应性实践和感知到的挑战。结果表明，代码片段适配对开发人员来说是一项具有挑战性的工作，因为他们需要花费大量精力来定制代码片段，以保证代码片段的正确性和质量。我们还注意到，开发人员都是借助文件内的上下文来完成改编的，这促使我们进一步研究开发人员如何在真实场景中执行基于上下文的改编（CA）。为此，我们对一个适应性数据集进行了定量研究，该数据集由 300 个代码片段重用案例组成，包含 1384 个从 Stack Overflow 到 GitHub 的适应性案例。对于每个改编，我们都会手动标注其意图以及与上下文的关系。基于我们的注释数据，我们采用频繁项集挖掘法从数据集中获得了四种 CA 模式，包括强化、代码布线、属性化和参数化。我们的主要发现包括(1) 一半以上的代码片段重用案例包含 CA，23.3% 的改编是 CA；(2) 一半以上的 CA 是纠正性改编，变量是主要的改编语言结构；(3) 属性是最常使用的上下文，88% 的本地上下文在最近的 10 个 LOC 范围内；(4) 面向不同意图的 CA 具有重复性，有利于自动改编。总之，我们的研究为代码片段适配提供了宝贵的见解，对研究、实践和工具设计具有重要意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

How Do Developers Adapt Code Snippets to Their Contexts? An Empirical Study of Context-Based Code Snippet Adaptations

Reusing code snippets from online programming Q&A communities has become a common development practice, in which developers often need to adapt code snippets to their code contexts to satisfy their own programming needs. However, how developers make these code adaptations based on contexts is still unclear. To bridge this gap, we first conduct a semi-structured interview of 21 developers to investigate their adaptation practices and perceived challenges during this process. The result suggests that code snippet adaptation is a challenging and exhausting task for developers, as they should tailor the snippets to guarantee their correctness and quality with laborious work. We also note that developers all resort to their intra-file context to complete adaptations, which motivates us to further study how developers performed context-based adaptations (CAs) in real scenarios. To this end, we conduct a quantitative study on an adaptation dataset comprising 300 code snippet reuse cases with 1,384 adaptations from Stack Overflow to GitHub. For each adaptation, we manually annotate its intention and relationship with the context. Based on our annotated data, we employ frequent itemset mining to obtain four CA patterns from our dataset, including Fortification , Code Wiring , Attribute-ization and Parameterization . Our main findings reveal that: (1) more than half of the code snippet reuse cases include CAs and 23.3% of the adaptations are CAs; (2) more than half of the CAs are corrective adaptations and variable is the primary adapted language construct; (3) attribute is the most frequently utilized context and 88% of the local contexts are within the nearest 10 LOCs; and (4) CAs towards different intentions are repetitive, which are useful for automatic adaptation. Overall, our study provides valuable insights into code snippet adaptation and has important implications for research, practice, and tool design.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Software Engineering 工程技术-工程：电子与电气

CiteScore

9.70

自引率

10.80%

发文量

724

审稿时长

6 months

期刊介绍： IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.