基于图的野外、细粒度、语义代码更改模式挖掘

H. Nguyen, T. Nguyen, Danny Dig, S. Nguyen, H. Tran, Michael C Hilton
{"title":"基于图的野外、细粒度、语义代码更改模式挖掘","authors":"H. Nguyen, T. Nguyen, Danny Dig, S. Nguyen, H. Tran, Michael C Hilton","doi":"10.1109/ICSE.2019.00089","DOIUrl":null,"url":null,"abstract":"Prior research exploited the repetitiveness of code changes to enable several tasks such as code completion, bug-fix recommendation, library adaption, etc. These and other novel applications require accurate detection of semantic changes, but the state-of-the-art methods are limited to algorithms that detect specific kinds of changes at the syntactic level. Existing algorithms relying on syntactic similarity have lower accuracy, and cannot effectively detect semantic change patterns. We introduce a novel graph-based mining approach, CPatMiner, to detect previously unknown repetitive changes in the wild, by mining fine-grained semantic code change patterns from a large number of repositories. To overcome unique challenges such as detecting meaningful change patterns and scaling to large repositories, we rely on fine-grained change graphs to capture program dependencies. We evaluate CPatMiner by mining change patterns in a diverse corpus of 5,000+ open-source projects from GitHub across a population of 170,000+ developers. We use three complementary methods. First, we sent the mined patterns to 108 open-source developers. We found that 70% of respondents recognized those patterns as their meaningful frequent changes. Moreover, 79% of respondents even named the patterns, and 44% wanted future IDEs to automate such repetitive changes. We found that the mined change patterns belong to various development activities: adaptive (9%), perfective (20%), corrective (35%) and preventive (36%, including refactorings). Second, we compared our tool with the state-of-the-art, AST-based technique, and reported that it detects 2.1x more meaningful patterns. Third, we use CPatMiner to search for patterns in a corpus of 88 GitHub projects with longer histories consisting of 164M SLOCs. It constructed 322K fine-grained change graphs containing 3M nodes, and detected 17K instances of change patterns from which we provide unique insights on the practice of change patterns among individuals and teams. We found that a large percentage (75%) of the change patterns from individual developers are commonly shared with others, and this holds true for teams. Moreover, we found that the patterns are not intermittent but spread widely over time. Thus, we call for a community-based change pattern database to provide important resources in novel applications.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"47 1","pages":"819-830"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"41","resultStr":"{\"title\":\"Graph-Based Mining of In-the-Wild, Fine-Grained, Semantic Code Change Patterns\",\"authors\":\"H. Nguyen, T. Nguyen, Danny Dig, S. Nguyen, H. Tran, Michael C Hilton\",\"doi\":\"10.1109/ICSE.2019.00089\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Prior research exploited the repetitiveness of code changes to enable several tasks such as code completion, bug-fix recommendation, library adaption, etc. These and other novel applications require accurate detection of semantic changes, but the state-of-the-art methods are limited to algorithms that detect specific kinds of changes at the syntactic level. Existing algorithms relying on syntactic similarity have lower accuracy, and cannot effectively detect semantic change patterns. We introduce a novel graph-based mining approach, CPatMiner, to detect previously unknown repetitive changes in the wild, by mining fine-grained semantic code change patterns from a large number of repositories. To overcome unique challenges such as detecting meaningful change patterns and scaling to large repositories, we rely on fine-grained change graphs to capture program dependencies. We evaluate CPatMiner by mining change patterns in a diverse corpus of 5,000+ open-source projects from GitHub across a population of 170,000+ developers. We use three complementary methods. First, we sent the mined patterns to 108 open-source developers. We found that 70% of respondents recognized those patterns as their meaningful frequent changes. Moreover, 79% of respondents even named the patterns, and 44% wanted future IDEs to automate such repetitive changes. We found that the mined change patterns belong to various development activities: adaptive (9%), perfective (20%), corrective (35%) and preventive (36%, including refactorings). Second, we compared our tool with the state-of-the-art, AST-based technique, and reported that it detects 2.1x more meaningful patterns. Third, we use CPatMiner to search for patterns in a corpus of 88 GitHub projects with longer histories consisting of 164M SLOCs. It constructed 322K fine-grained change graphs containing 3M nodes, and detected 17K instances of change patterns from which we provide unique insights on the practice of change patterns among individuals and teams. We found that a large percentage (75%) of the change patterns from individual developers are commonly shared with others, and this holds true for teams. Moreover, we found that the patterns are not intermittent but spread widely over time. Thus, we call for a community-based change pattern database to provide important resources in novel applications.\",\"PeriodicalId\":6736,\"journal\":{\"name\":\"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)\",\"volume\":\"47 1\",\"pages\":\"819-830\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"41\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSE.2019.00089\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE.2019.00089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 41

摘要

先前的研究利用代码更改的重复性来实现一些任务,如代码完成、bug修复建议、库适配等。这些和其他新颖的应用程序需要准确地检测语义变化,但是最先进的方法仅限于在语法级别检测特定类型变化的算法。现有的依赖句法相似度的算法准确率较低,不能有效检测语义变化模式。我们引入了一种新颖的基于图的挖掘方法CPatMiner,通过从大量存储库中挖掘细粒度的语义代码更改模式来检测以前未知的重复更改。为了克服独特的挑战,例如检测有意义的变更模式和扩展到大型存储库,我们依赖于细粒度的变更图来捕获程序依赖关系。我们通过挖掘来自GitHub的5000多个开源项目的不同语料库中的变化模式来评估CPatMiner,这些项目涉及170,000多名开发人员。我们使用三种互补的方法。首先,我们将挖掘的模式发送给108个开源开发人员。我们发现70%的受访者认为这些模式是他们有意义的频繁变化。此外,79%的受访者甚至命名了模式,44%的受访者希望未来的ide能够自动化这种重复的更改。我们发现挖掘的变更模式属于各种开发活动:适应性的(9%),完美的(20%),纠正性的(35%)和预防性的(36%,包括重构)。其次,我们将我们的工具与最先进的基于ast的技术进行了比较,并报告说它检测到的有意义的模式多2.1倍。第三,我们使用CPatMiner在包含88个GitHub项目的语料库中搜索模式,这些项目的历史较长,包含164M sloc。它构建了包含3M节点的322K个细粒度变更图,并检测了17K个变更模式实例,从中我们提供了关于个人和团队之间变更模式实践的独特见解。我们发现,来自单个开发人员的很大比例(75%)的变更模式通常与其他人共享,对于团队来说也是如此。此外,我们发现这种模式不是间歇性的,而是随着时间的推移而广泛传播的。因此,我们需要一个基于社区的变化模式数据库来为新的应用程序提供重要的资源。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Graph-Based Mining of In-the-Wild, Fine-Grained, Semantic Code Change Patterns
Prior research exploited the repetitiveness of code changes to enable several tasks such as code completion, bug-fix recommendation, library adaption, etc. These and other novel applications require accurate detection of semantic changes, but the state-of-the-art methods are limited to algorithms that detect specific kinds of changes at the syntactic level. Existing algorithms relying on syntactic similarity have lower accuracy, and cannot effectively detect semantic change patterns. We introduce a novel graph-based mining approach, CPatMiner, to detect previously unknown repetitive changes in the wild, by mining fine-grained semantic code change patterns from a large number of repositories. To overcome unique challenges such as detecting meaningful change patterns and scaling to large repositories, we rely on fine-grained change graphs to capture program dependencies. We evaluate CPatMiner by mining change patterns in a diverse corpus of 5,000+ open-source projects from GitHub across a population of 170,000+ developers. We use three complementary methods. First, we sent the mined patterns to 108 open-source developers. We found that 70% of respondents recognized those patterns as their meaningful frequent changes. Moreover, 79% of respondents even named the patterns, and 44% wanted future IDEs to automate such repetitive changes. We found that the mined change patterns belong to various development activities: adaptive (9%), perfective (20%), corrective (35%) and preventive (36%, including refactorings). Second, we compared our tool with the state-of-the-art, AST-based technique, and reported that it detects 2.1x more meaningful patterns. Third, we use CPatMiner to search for patterns in a corpus of 88 GitHub projects with longer histories consisting of 164M SLOCs. It constructed 322K fine-grained change graphs containing 3M nodes, and detected 17K instances of change patterns from which we provide unique insights on the practice of change patterns among individuals and teams. We found that a large percentage (75%) of the change patterns from individual developers are commonly shared with others, and this holds true for teams. Moreover, we found that the patterns are not intermittent but spread widely over time. Thus, we call for a community-based change pattern database to provide important resources in novel applications.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
VFix: Value-Flow-Guided Precise Program Repair for Null Pointer Dereferences Search-Based Energy Testing of Android Scalable Approaches for Test Suite Reduction A System Identification Based Oracle for Control-CPS Software Fault Localization Training Binary Classifiers as Data Structure Invariants
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1