{"title":"垃圾短信过滤中的词权研究","authors":"Turgut Dogan","doi":"10.35377/saucis.03.03.735463","DOIUrl":null,"url":null,"abstract":"Due to rapid development of the technology, the usage of mobile telephones and short message services (SMS) have become widespread. Thus, the number of spam SMS messages has dramatically increased and the significance of identifying and filtering of suchlike messages raised. Moreover, since they have also risk to steal users’ personal information; the problem of identifying and filtering of Spam SMS messages stays popular in terms of also information and data security. In this study, the classification performances of five different term weighting methods on three different datasets containing SMS messages categorized as Spam and legitimate are compared by using two classifiers for corresponding problem. The results obtained showed that reasonable weighting of SMS contents plays an important role in identifying of spam SMS messages. On the other hand, it can be expressed that real classification potential of term weighting schemes reflected betterly the with feature vectors created by using fifty and higher number of terms on especially Turkish and English SMS message datasets. In addition, it has been observed that value ranges of the classification results of obtained from term weighting methods on Turkish SMS message dataset is wider for than ones obtained in English SMS message datasets.","PeriodicalId":257636,"journal":{"name":"Sakarya University Journal of Computer and Information Sciences","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On Term Weighting for Spam SMS Filtering\",\"authors\":\"Turgut Dogan\",\"doi\":\"10.35377/saucis.03.03.735463\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to rapid development of the technology, the usage of mobile telephones and short message services (SMS) have become widespread. Thus, the number of spam SMS messages has dramatically increased and the significance of identifying and filtering of suchlike messages raised. Moreover, since they have also risk to steal users’ personal information; the problem of identifying and filtering of Spam SMS messages stays popular in terms of also information and data security. In this study, the classification performances of five different term weighting methods on three different datasets containing SMS messages categorized as Spam and legitimate are compared by using two classifiers for corresponding problem. The results obtained showed that reasonable weighting of SMS contents plays an important role in identifying of spam SMS messages. On the other hand, it can be expressed that real classification potential of term weighting schemes reflected betterly the with feature vectors created by using fifty and higher number of terms on especially Turkish and English SMS message datasets. In addition, it has been observed that value ranges of the classification results of obtained from term weighting methods on Turkish SMS message dataset is wider for than ones obtained in English SMS message datasets.\",\"PeriodicalId\":257636,\"journal\":{\"name\":\"Sakarya University Journal of Computer and Information Sciences\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sakarya University Journal of Computer and Information Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.35377/saucis.03.03.735463\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sakarya University Journal of Computer and Information Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.35377/saucis.03.03.735463","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Due to rapid development of the technology, the usage of mobile telephones and short message services (SMS) have become widespread. Thus, the number of spam SMS messages has dramatically increased and the significance of identifying and filtering of suchlike messages raised. Moreover, since they have also risk to steal users’ personal information; the problem of identifying and filtering of Spam SMS messages stays popular in terms of also information and data security. In this study, the classification performances of five different term weighting methods on three different datasets containing SMS messages categorized as Spam and legitimate are compared by using two classifiers for corresponding problem. The results obtained showed that reasonable weighting of SMS contents plays an important role in identifying of spam SMS messages. On the other hand, it can be expressed that real classification potential of term weighting schemes reflected betterly the with feature vectors created by using fifty and higher number of terms on especially Turkish and English SMS message datasets. In addition, it has been observed that value ranges of the classification results of obtained from term weighting methods on Turkish SMS message dataset is wider for than ones obtained in English SMS message datasets.