C. Oehmen, Elena S. Peterson, Aaron R. Phillips, Darren S. Curtis
{"title":"A Biosequence-Based Approach to Software Characterization","authors":"C. Oehmen, Elena S. Peterson, Aaron R. Phillips, Darren S. Curtis","doi":"10.1109/SPW.2016.43","DOIUrl":null,"url":null,"abstract":"For many applications, it is desirable to have a process for recognizing when software binaries are closely related without relying on them to be identical or have identical segments. But doing so in a dynamic environment is a nontrivial task because most approaches to software similarity require extensive and time-consuming analysis of a binary, or they fail to recognize executables that are similar but not identical. Presented herein is a novel biosequence-based method for quantifying similarity of executable binaries. Using this method, we show in an example application on large-scale multi-author codes that 1) the biosequence-based method has a statistical performance in recognizing and distinguishing between a collection of real-world high performance computing applications better than 90% of ideal, and 2) an example of using family-tree analysis to tune identification for a code subfamily can achieve better than 99% of ideal performance.","PeriodicalId":341207,"journal":{"name":"2016 IEEE Security and Privacy Workshops (SPW)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Security and Privacy Workshops (SPW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPW.2016.43","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
For many applications, it is desirable to have a process for recognizing when software binaries are closely related without relying on them to be identical or have identical segments. But doing so in a dynamic environment is a nontrivial task because most approaches to software similarity require extensive and time-consuming analysis of a binary, or they fail to recognize executables that are similar but not identical. Presented herein is a novel biosequence-based method for quantifying similarity of executable binaries. Using this method, we show in an example application on large-scale multi-author codes that 1) the biosequence-based method has a statistical performance in recognizing and distinguishing between a collection of real-world high performance computing applications better than 90% of ideal, and 2) an example of using family-tree analysis to tune identification for a code subfamily can achieve better than 99% of ideal performance.