{"title":"Spam Email Image Classification Based on Text and Image Features","authors":"Estqlal Hammad Dhah, M. A. Naser, Suhad A. Ali","doi":"10.1109/CAS47993.2019.9075725","DOIUrl":null,"url":null,"abstract":"Filtering of spam image-based email remains a major challenge for researchers. This paper presents a proposed work which is based on several facts such that spam images containing a large percentage of text which has characteristics or features different from other types of images. In addition to that, there is much similarity between the features of these images. These facts can be used to distinguish text regions spam images from others. A hybrid method based on combined features vector from text regions and features of the image is proposed. Two types of features are extracted. The first features extraction method is the local binary pattern (LBP) with extricating the image texture features directly, while the second is utilised to extricate features of image text regions only. The extracted features are used in individual and combination style in order to learn classifiers at the training stage. A one-class KNN classifier and two-class KNN classifier are applied separately. Each classifier was used in three fashion, with the text-regions features, with texture features in the image, and with merging both those features respectively. Experimental results showed that the appropriation of both image and text features together improves the effectiveness of the classification concerning the case in which only image or text features are used.","PeriodicalId":202291,"journal":{"name":"2019 First International Conference of Computer and Applied Sciences (CAS)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 First International Conference of Computer and Applied Sciences (CAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CAS47993.2019.9075725","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Filtering of spam image-based email remains a major challenge for researchers. This paper presents a proposed work which is based on several facts such that spam images containing a large percentage of text which has characteristics or features different from other types of images. In addition to that, there is much similarity between the features of these images. These facts can be used to distinguish text regions spam images from others. A hybrid method based on combined features vector from text regions and features of the image is proposed. Two types of features are extracted. The first features extraction method is the local binary pattern (LBP) with extricating the image texture features directly, while the second is utilised to extricate features of image text regions only. The extracted features are used in individual and combination style in order to learn classifiers at the training stage. A one-class KNN classifier and two-class KNN classifier are applied separately. Each classifier was used in three fashion, with the text-regions features, with texture features in the image, and with merging both those features respectively. Experimental results showed that the appropriation of both image and text features together improves the effectiveness of the classification concerning the case in which only image or text features are used.