{"title":"An optimized k-NN classifier based on minimum spanning tree for email filtering","authors":"Anirban Chakrabarty, S. Roy","doi":"10.1109/ICBIM.2014.6970931","DOIUrl":null,"url":null,"abstract":"In the era of internet where mailboxes are being flooded by unnecessary emails, it becomes troublesome and time consuming to organize and classify legitimate emails into folders. Although there has been extensive investigation of automatic document categorization, email classification gives rise to a number of challenges, and there has been relatively little study in this domain. This paper presents a framework for email classification using Enron email dataset based on an improved k-NN classification using a minimum spanning tree clustering algorithm considering the case where the number of clusters (email folders) are unknown initially. Such a classification can be useful in maintaining email and web directories, identifying spam and valid mails. Experimental results show that the proposed algorithm outperforms state of art classification algorithms like standard k-NN and Naïve Bayes classifiers and c4.5 decision tree classifier.","PeriodicalId":6549,"journal":{"name":"2014 2nd International Conference on Business and Information Management (ICBIM)","volume":"26 1","pages":"47-52"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 2nd International Conference on Business and Information Management (ICBIM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBIM.2014.6970931","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
In the era of internet where mailboxes are being flooded by unnecessary emails, it becomes troublesome and time consuming to organize and classify legitimate emails into folders. Although there has been extensive investigation of automatic document categorization, email classification gives rise to a number of challenges, and there has been relatively little study in this domain. This paper presents a framework for email classification using Enron email dataset based on an improved k-NN classification using a minimum spanning tree clustering algorithm considering the case where the number of clusters (email folders) are unknown initially. Such a classification can be useful in maintaining email and web directories, identifying spam and valid mails. Experimental results show that the proposed algorithm outperforms state of art classification algorithms like standard k-NN and Naïve Bayes classifiers and c4.5 decision tree classifier.