代码库中功能冗余的探索性研究

2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM) Pub Date : 2017-09-01 DOI:10.1109/SCAM.2017.21

Marcelo Suzuki, A. C. D. Paula, E. Guerra, C. Lopes, Otávio Augusto Lazzarini Lemos

{"title":"代码库中功能冗余的探索性研究","authors":"Marcelo Suzuki, A. C. D. Paula, E. Guerra, C. Lopes, Otávio Augusto Lazzarini Lemos","doi":"10.1109/SCAM.2017.21","DOIUrl":null,"url":null,"abstract":"In large code repositories, the probability of functions to repeat across projects is high. This type of functional redundancy (FR) is desirable for recent code reuse and repair approaches. Yet, FR is hard to measure because it is closely related to program equivalence, which is an undecidable problem. This is one of the reasons most studies that investigate redundancy focus on syntactic rather than semantic replication (e.g., cloning). In this paper we evaluate the extent of FR in a code repository with 68 Java projects taken randomly from SourceForge. Our technique approximates function similarity by first searching for methods that possess similar interfaces (return type, name, and parameter types). We then execute these methods to verify which candidate pairs have matching outputs for a given sample of inputs. Some recent studies have also focused on this type of semantic replication, but our detection approach is generally cheaper and more precise, because it focuses on methods and uses interfaces to reduce the search space. Although our scope is restricted to static methods, which makes our results conservative, our findings are promising. In particular, we found 984 pairs of redundant methods, and 28 out of the 68 (41.17%) projects in the repository presented redundancy. Moreover, the majority of redundant methods for which we had access to the source code did not refer to textual clones (only one redundant method pair referred to replicated code). Our study also indicates that the proposed redundancy detection approach has high precision and is generally inexpensive (only four executions were required per method to attain 100% precision).","PeriodicalId":306744,"journal":{"name":"2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"An Exploratory Study of Functional Redundancy in Code Repositories\",\"authors\":\"Marcelo Suzuki, A. C. D. Paula, E. Guerra, C. Lopes, Otávio Augusto Lazzarini Lemos\",\"doi\":\"10.1109/SCAM.2017.21\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In large code repositories, the probability of functions to repeat across projects is high. This type of functional redundancy (FR) is desirable for recent code reuse and repair approaches. Yet, FR is hard to measure because it is closely related to program equivalence, which is an undecidable problem. This is one of the reasons most studies that investigate redundancy focus on syntactic rather than semantic replication (e.g., cloning). In this paper we evaluate the extent of FR in a code repository with 68 Java projects taken randomly from SourceForge. Our technique approximates function similarity by first searching for methods that possess similar interfaces (return type, name, and parameter types). We then execute these methods to verify which candidate pairs have matching outputs for a given sample of inputs. Some recent studies have also focused on this type of semantic replication, but our detection approach is generally cheaper and more precise, because it focuses on methods and uses interfaces to reduce the search space. Although our scope is restricted to static methods, which makes our results conservative, our findings are promising. In particular, we found 984 pairs of redundant methods, and 28 out of the 68 (41.17%) projects in the repository presented redundancy. Moreover, the majority of redundant methods for which we had access to the source code did not refer to textual clones (only one redundant method pair referred to replicated code). Our study also indicates that the proposed redundancy detection approach has high precision and is generally inexpensive (only four executions were required per method to attain 100% precision).\",\"PeriodicalId\":306744,\"journal\":{\"name\":\"2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SCAM.2017.21\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCAM.2017.21","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

在大型代码存储库中，跨项目重复函数的可能性很高。这种类型的功能冗余(FR)对于最近的代码重用和修复方法是理想的。然而，由于它与程序等价密切相关，程序等价是一个不可确定的问题，因此难以测量。这就是大多数研究冗余的研究关注语法复制而不是语义复制(例如克隆)的原因之一。在本文中，我们用从SourceForge随机抽取的68个Java项目来评估代码库中FR的范围。我们的技术通过首先搜索具有相似接口(返回类型、名称和参数类型)的方法来近似函数相似性。然后我们执行这些方法来验证对于给定的输入样本，哪些候选对具有匹配的输出。最近的一些研究也关注这种类型的语义复制，但我们的检测方法通常更便宜，更精确，因为它关注方法并使用接口来减少搜索空间。虽然我们的范围仅限于静态方法，这使得我们的结果保守，但我们的发现是有希望的。特别是，我们发现了984对冗余方法，并且存储库中的68个项目中有28个(41.17%)存在冗余。此外，我们可以访问源代码的大多数冗余方法都没有引用文本克隆(只有一个冗余方法对引用复制的代码)。我们的研究还表明，所提出的冗余检测方法具有很高的精度，并且通常成本低廉(每种方法只需执行四次即可达到100%的精度)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An Exploratory Study of Functional Redundancy in Code Repositories

In large code repositories, the probability of functions to repeat across projects is high. This type of functional redundancy (FR) is desirable for recent code reuse and repair approaches. Yet, FR is hard to measure because it is closely related to program equivalence, which is an undecidable problem. This is one of the reasons most studies that investigate redundancy focus on syntactic rather than semantic replication (e.g., cloning). In this paper we evaluate the extent of FR in a code repository with 68 Java projects taken randomly from SourceForge. Our technique approximates function similarity by first searching for methods that possess similar interfaces (return type, name, and parameter types). We then execute these methods to verify which candidate pairs have matching outputs for a given sample of inputs. Some recent studies have also focused on this type of semantic replication, but our detection approach is generally cheaper and more precise, because it focuses on methods and uses interfaces to reduce the search space. Although our scope is restricted to static methods, which makes our results conservative, our findings are promising. In particular, we found 984 pairs of redundant methods, and 28 out of the 68 (41.17%) projects in the repository presented redundancy. Moreover, the majority of redundant methods for which we had access to the source code did not refer to textual clones (only one redundant method pair referred to replicated code). Our study also indicates that the proposed redundancy detection approach has high precision and is generally inexpensive (only four executions were required per method to attain 100% precision).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM)

自引率

0.00%

发文量

期刊最新文献

How do Scratch Programmers Name Variables and Procedures? Extracting Timed Automata from Java Methods An Exploratory Study of Functional Redundancy in Code Repositories Investigating the Use of Code Analysis and NLP to Promote a Consistent Usage of Identifiers Supporting Analysis of SQL Queries in PHP AiR