{"title":"Multidimensional Correlation of Software Source Code","authors":"R. Zeidman","doi":"10.1109/SADFE.2008.9","DOIUrl":null,"url":null,"abstract":"Standard ways of calculating the similarity of different computer programs are needed in computer science. Such measurements can be useful in many different areas such as clone detection, refactoring, compiler optimization, and run-time optimization. Such standards are particularly important for uncovering plagiarism, trade secret theft, copyright infringement, and patent infringement. Other uses include locating open source code within a proprietary program and determining the authors of different programs. In a previous paper (R. Zeidman, 2006) I introduced the concept of source code correlation, presented a theoretical basis for such a measure, and described a program, CodeMatchreg, that compares software source code and calculates correlation. That paper compared the described method of source code correlation against existing methods of comparing source code and found it to be significantly superior. This paper refines that definition of source code correlation and presents a new, more robust, definition of multidimensional source code correlation.","PeriodicalId":391486,"journal":{"name":"2008 Third International Workshop on Systematic Approaches to Digital Forensic Engineering","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 Third International Workshop on Systematic Approaches to Digital Forensic Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SADFE.2008.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Standard ways of calculating the similarity of different computer programs are needed in computer science. Such measurements can be useful in many different areas such as clone detection, refactoring, compiler optimization, and run-time optimization. Such standards are particularly important for uncovering plagiarism, trade secret theft, copyright infringement, and patent infringement. Other uses include locating open source code within a proprietary program and determining the authors of different programs. In a previous paper (R. Zeidman, 2006) I introduced the concept of source code correlation, presented a theoretical basis for such a measure, and described a program, CodeMatchreg, that compares software source code and calculates correlation. That paper compared the described method of source code correlation against existing methods of comparing source code and found it to be significantly superior. This paper refines that definition of source code correlation and presents a new, more robust, definition of multidimensional source code correlation.