On Term Weighting for Spam SMS Filtering

Turgut Dogan
{"title":"On Term Weighting for Spam SMS Filtering","authors":"Turgut Dogan","doi":"10.35377/saucis.03.03.735463","DOIUrl":null,"url":null,"abstract":"Due to rapid development of the technology, the usage of mobile telephones and short message services (SMS) have become widespread. Thus, the number of spam SMS messages has dramatically increased and the significance of identifying and filtering of suchlike messages raised. Moreover, since they have also risk to steal users’ personal information; the problem of identifying and filtering of Spam SMS messages stays popular in terms of also information and data security. In this study, the classification performances of five different term weighting methods on three different datasets containing SMS messages categorized as Spam and legitimate are compared by using two classifiers for corresponding problem. The results obtained showed that reasonable weighting of SMS contents plays an important role in identifying of spam SMS messages. On the other hand, it can be expressed that real classification potential of term weighting schemes reflected betterly the with feature vectors created by using fifty and higher number of terms on especially Turkish and English SMS message datasets. In addition, it has been observed that value ranges of the classification results of obtained from term weighting methods on Turkish SMS message dataset is wider for than ones obtained in English SMS message datasets.","PeriodicalId":257636,"journal":{"name":"Sakarya University Journal of Computer and Information Sciences","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sakarya University Journal of Computer and Information Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.35377/saucis.03.03.735463","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Due to rapid development of the technology, the usage of mobile telephones and short message services (SMS) have become widespread. Thus, the number of spam SMS messages has dramatically increased and the significance of identifying and filtering of suchlike messages raised. Moreover, since they have also risk to steal users’ personal information; the problem of identifying and filtering of Spam SMS messages stays popular in terms of also information and data security. In this study, the classification performances of five different term weighting methods on three different datasets containing SMS messages categorized as Spam and legitimate are compared by using two classifiers for corresponding problem. The results obtained showed that reasonable weighting of SMS contents plays an important role in identifying of spam SMS messages. On the other hand, it can be expressed that real classification potential of term weighting schemes reflected betterly the with feature vectors created by using fifty and higher number of terms on especially Turkish and English SMS message datasets. In addition, it has been observed that value ranges of the classification results of obtained from term weighting methods on Turkish SMS message dataset is wider for than ones obtained in English SMS message datasets.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
垃圾短信过滤中的词权研究
由于技术的快速发展,移动电话和短信服务(SMS)的使用已经变得普遍。因此,垃圾短信的数量急剧增加,识别和过滤此类短信的重要性也随之提高。此外,由于他们也有窃取用户个人信息的风险;识别和过滤垃圾短信的问题在信息和数据安全方面仍然很受欢迎。在本研究中,通过使用两个分类器对相应问题进行分类,比较了五种不同的词加权方法在包含垃圾短信和合法短信的三种不同数据集上的分类性能。结果表明,合理的短信内容权重对识别垃圾短信具有重要作用。另一方面,在土耳其语和英语短信数据集上,术语加权方案的真实分类潜力更好地反映了使用50个及以上数量的术语创建的特征向量。此外,我们还观察到,在土耳其语短信数据集上使用术语加权方法得到的分类结果的取值范围比在英语短信数据集上得到的分类结果的取值范围更宽。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Prediction of Cardiovascular Disease Based on Voting Ensemble Model and SHAP Analysis A NOVEL ADDITIVE INTERNET OF THINGS (IoT) FEATURES AND CONVOLUTIONAL NEURAL NETWORK FOR CLASSIFICATION AND SOURCE IDENTIFICATION OF IoT DEVICES High-Capacity Multiplier Design Using Look Up Table Sequential and Correlated Image Hash Code Generation with Deep Reinforcement Learning Price Prediction Using Web Scraping and Machine Learning Algorithms in the Used Car Market
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1