{"title":"Spectral malware behavior clustering","authors":"C. Giannella, E. Bloedorn","doi":"10.1109/ISI.2015.7165931","DOIUrl":null,"url":null,"abstract":"We develop a version of spectral clustering and empirically study its performance when applied to behavior-based malware clustering. In 2011, a behavior-based malware clustering algorithm was reported by Rieck et al. We hypothesize that, owing to the more complex nature of our algorithm, it will exhibit higher accuracy than Rieck's but will require greater run-time. Through experiments using three different malware datasets, we largely substantiate this hypothesis. Our approach had comparable or superior accuracy to Rieck's over all of its parameter settings examined and ours had higher run-times (nonetheless, ours had run-times of less than one minute on all datasets). We also found our algorithm had no clear accuracy advantage, but much smaller run-times than Hierarchical Agglomerative Clustering.","PeriodicalId":292352,"journal":{"name":"2015 IEEE International Conference on Intelligence and Security Informatics (ISI)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Intelligence and Security Informatics (ISI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISI.2015.7165931","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
We develop a version of spectral clustering and empirically study its performance when applied to behavior-based malware clustering. In 2011, a behavior-based malware clustering algorithm was reported by Rieck et al. We hypothesize that, owing to the more complex nature of our algorithm, it will exhibit higher accuracy than Rieck's but will require greater run-time. Through experiments using three different malware datasets, we largely substantiate this hypothesis. Our approach had comparable or superior accuracy to Rieck's over all of its parameter settings examined and ours had higher run-times (nonetheless, ours had run-times of less than one minute on all datasets). We also found our algorithm had no clear accuracy advantage, but much smaller run-times than Hierarchical Agglomerative Clustering.