{"title":"File Fragment Type Classification by Bag-Of-Visual-Words","authors":"Mina Erfan, S. Jalili","doi":"10.22042/ISECURE.2021.243876.570","DOIUrl":null,"url":null,"abstract":"File fragments’ type classification in the absence of header and file system information, is a major building block in various solutions devoted to file carving, memory analysis and network forensics. Over the past decades, a substantial amount of effort has been put into developing methods to classify file fragments. Meanwhile, there has been little innovation on the basics of approaches given into file and fragment type classification. In this research, by mapping each fragment as an 8-bit grayscale image, a method of texture analysis has been used in place of a classifier. Essentially, we show how to construct a vocabulary of visual words with the Bag-of-Visual-Words method. Using the n-gram technique, the feature vector is comprised of visual words occurrence. On the classification of 31 file types over 31000 fragments, our approach reached a maximum overall accuracy of 74.9% in classifying 512 byte fragments and 87.3% in classifying 4096 byte fragments.","PeriodicalId":436674,"journal":{"name":"ISC Int. J. Inf. Secur.","volume":"95 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISC Int. J. Inf. Secur.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22042/ISECURE.2021.243876.570","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
File fragments’ type classification in the absence of header and file system information, is a major building block in various solutions devoted to file carving, memory analysis and network forensics. Over the past decades, a substantial amount of effort has been put into developing methods to classify file fragments. Meanwhile, there has been little innovation on the basics of approaches given into file and fragment type classification. In this research, by mapping each fragment as an 8-bit grayscale image, a method of texture analysis has been used in place of a classifier. Essentially, we show how to construct a vocabulary of visual words with the Bag-of-Visual-Words method. Using the n-gram technique, the feature vector is comprised of visual words occurrence. On the classification of 31 file types over 31000 fragments, our approach reached a maximum overall accuracy of 74.9% in classifying 512 byte fragments and 87.3% in classifying 4096 byte fragments.