{"title":"识别源代码中的高级概念克隆","authors":"Andrian Marcus, Jonathan I. Maletic","doi":"10.1109/ASE.2001.989796","DOIUrl":null,"url":null,"abstract":"Source code duplication occurs frequently within large software systems. Pieces of source code, functions, and data types are often duplicated in part or in whole, for a variety of reasons. Programmers may simply be reusing a piece of code via copy and paste or they may be \"re-inventing the wheel\". Previous research on the detection of clones is mainly focused on identifying pieces of code with similar (or nearly similar) structure. Our approach is to examine the source code text (comments and identifiers) and identify implementations of similar high-level concepts (e.g., abstract data types). The approach uses an information retrieval technique (i.e., latent semantic indexing) to statically analyze the software system and determine semantic similarities between source code documents (i.e., functions, files, or code segments). These similarity measures are used to drive the clone detection process. The intention of our approach is to enhance and augment existing clone detection methods that are based on structural analysis. This synergistic use of methods will improve the quality of clone detection. A set of experiments is presented that demonstrate the usage of semantic similarity measure to identify clones within a version of NCSA Mosaic.","PeriodicalId":433615,"journal":{"name":"Proceedings 16th Annual International Conference on Automated Software Engineering (ASE 2001)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"316","resultStr":"{\"title\":\"Identification of high-level concept clones in source code\",\"authors\":\"Andrian Marcus, Jonathan I. Maletic\",\"doi\":\"10.1109/ASE.2001.989796\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Source code duplication occurs frequently within large software systems. Pieces of source code, functions, and data types are often duplicated in part or in whole, for a variety of reasons. Programmers may simply be reusing a piece of code via copy and paste or they may be \\\"re-inventing the wheel\\\". Previous research on the detection of clones is mainly focused on identifying pieces of code with similar (or nearly similar) structure. Our approach is to examine the source code text (comments and identifiers) and identify implementations of similar high-level concepts (e.g., abstract data types). The approach uses an information retrieval technique (i.e., latent semantic indexing) to statically analyze the software system and determine semantic similarities between source code documents (i.e., functions, files, or code segments). These similarity measures are used to drive the clone detection process. The intention of our approach is to enhance and augment existing clone detection methods that are based on structural analysis. This synergistic use of methods will improve the quality of clone detection. A set of experiments is presented that demonstrate the usage of semantic similarity measure to identify clones within a version of NCSA Mosaic.\",\"PeriodicalId\":433615,\"journal\":{\"name\":\"Proceedings 16th Annual International Conference on Automated Software Engineering (ASE 2001)\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-11-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"316\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 16th Annual International Conference on Automated Software Engineering (ASE 2001)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASE.2001.989796\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 16th Annual International Conference on Automated Software Engineering (ASE 2001)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASE.2001.989796","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Identification of high-level concept clones in source code
Source code duplication occurs frequently within large software systems. Pieces of source code, functions, and data types are often duplicated in part or in whole, for a variety of reasons. Programmers may simply be reusing a piece of code via copy and paste or they may be "re-inventing the wheel". Previous research on the detection of clones is mainly focused on identifying pieces of code with similar (or nearly similar) structure. Our approach is to examine the source code text (comments and identifiers) and identify implementations of similar high-level concepts (e.g., abstract data types). The approach uses an information retrieval technique (i.e., latent semantic indexing) to statically analyze the software system and determine semantic similarities between source code documents (i.e., functions, files, or code segments). These similarity measures are used to drive the clone detection process. The intention of our approach is to enhance and augment existing clone detection methods that are based on structural analysis. This synergistic use of methods will improve the quality of clone detection. A set of experiments is presented that demonstrate the usage of semantic similarity measure to identify clones within a version of NCSA Mosaic.