Marcelo Suzuki, A. C. D. Paula, E. Guerra, C. Lopes, Otávio Augusto Lazzarini Lemos
{"title":"代码库中功能冗余的探索性研究","authors":"Marcelo Suzuki, A. C. D. Paula, E. Guerra, C. Lopes, Otávio Augusto Lazzarini Lemos","doi":"10.1109/SCAM.2017.21","DOIUrl":null,"url":null,"abstract":"In large code repositories, the probability of functions to repeat across projects is high. This type of functional redundancy (FR) is desirable for recent code reuse and repair approaches. Yet, FR is hard to measure because it is closely related to program equivalence, which is an undecidable problem. This is one of the reasons most studies that investigate redundancy focus on syntactic rather than semantic replication (e.g., cloning). In this paper we evaluate the extent of FR in a code repository with 68 Java projects taken randomly from SourceForge. Our technique approximates function similarity by first searching for methods that possess similar interfaces (return type, name, and parameter types). We then execute these methods to verify which candidate pairs have matching outputs for a given sample of inputs. Some recent studies have also focused on this type of semantic replication, but our detection approach is generally cheaper and more precise, because it focuses on methods and uses interfaces to reduce the search space. Although our scope is restricted to static methods, which makes our results conservative, our findings are promising. In particular, we found 984 pairs of redundant methods, and 28 out of the 68 (41.17%) projects in the repository presented redundancy. Moreover, the majority of redundant methods for which we had access to the source code did not refer to textual clones (only one redundant method pair referred to replicated code). Our study also indicates that the proposed redundancy detection approach has high precision and is generally inexpensive (only four executions were required per method to attain 100% precision).","PeriodicalId":306744,"journal":{"name":"2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"An Exploratory Study of Functional Redundancy in Code Repositories\",\"authors\":\"Marcelo Suzuki, A. C. D. Paula, E. Guerra, C. Lopes, Otávio Augusto Lazzarini Lemos\",\"doi\":\"10.1109/SCAM.2017.21\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In large code repositories, the probability of functions to repeat across projects is high. This type of functional redundancy (FR) is desirable for recent code reuse and repair approaches. Yet, FR is hard to measure because it is closely related to program equivalence, which is an undecidable problem. This is one of the reasons most studies that investigate redundancy focus on syntactic rather than semantic replication (e.g., cloning). In this paper we evaluate the extent of FR in a code repository with 68 Java projects taken randomly from SourceForge. Our technique approximates function similarity by first searching for methods that possess similar interfaces (return type, name, and parameter types). We then execute these methods to verify which candidate pairs have matching outputs for a given sample of inputs. Some recent studies have also focused on this type of semantic replication, but our detection approach is generally cheaper and more precise, because it focuses on methods and uses interfaces to reduce the search space. Although our scope is restricted to static methods, which makes our results conservative, our findings are promising. In particular, we found 984 pairs of redundant methods, and 28 out of the 68 (41.17%) projects in the repository presented redundancy. Moreover, the majority of redundant methods for which we had access to the source code did not refer to textual clones (only one redundant method pair referred to replicated code). Our study also indicates that the proposed redundancy detection approach has high precision and is generally inexpensive (only four executions were required per method to attain 100% precision).\",\"PeriodicalId\":306744,\"journal\":{\"name\":\"2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SCAM.2017.21\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCAM.2017.21","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Exploratory Study of Functional Redundancy in Code Repositories
In large code repositories, the probability of functions to repeat across projects is high. This type of functional redundancy (FR) is desirable for recent code reuse and repair approaches. Yet, FR is hard to measure because it is closely related to program equivalence, which is an undecidable problem. This is one of the reasons most studies that investigate redundancy focus on syntactic rather than semantic replication (e.g., cloning). In this paper we evaluate the extent of FR in a code repository with 68 Java projects taken randomly from SourceForge. Our technique approximates function similarity by first searching for methods that possess similar interfaces (return type, name, and parameter types). We then execute these methods to verify which candidate pairs have matching outputs for a given sample of inputs. Some recent studies have also focused on this type of semantic replication, but our detection approach is generally cheaper and more precise, because it focuses on methods and uses interfaces to reduce the search space. Although our scope is restricted to static methods, which makes our results conservative, our findings are promising. In particular, we found 984 pairs of redundant methods, and 28 out of the 68 (41.17%) projects in the repository presented redundancy. Moreover, the majority of redundant methods for which we had access to the source code did not refer to textual clones (only one redundant method pair referred to replicated code). Our study also indicates that the proposed redundancy detection approach has high precision and is generally inexpensive (only four executions were required per method to attain 100% precision).