基于模因算法的短信服务垃圾邮件过滤采用文本规范化和语义化方法

A. Ojugo, A. Eboka
{"title":"基于模因算法的短信服务垃圾邮件过滤采用文本规范化和语义化方法","authors":"A. Ojugo, A. Eboka","doi":"10.11591/ijict.v9i1.pp9-18","DOIUrl":null,"url":null,"abstract":"Today’s popularity of the short messages services (SMS) has created a propitious environment for spamming to thrive. Spams are unsolicited advertising, adult-themed or inappropriate content, premium fraud, smishing and malware. They are a constant reminder of the need for an effective spam filter. However, SMS limitations of 160-charcaters and 140-bytes size as well as its being rippled with slangs, emoticons and abbreviations further inhibits effective training of models to aid accurate classification. The study proposes Genetic Algorithm Trained Bayesian Network solution that seeks to normalize noisy feats, expand text via use of lexicographic and semantic dictionaries that uses word sense disambiguation technique to train the underlying learning heuristics. And in turn, effectively help to classify SMS in spam and legitimate classes. Hybrid model comprises of text preprocessing, feature selection as well as training and classification section. Study uses a hybrid Genetic Algorithm trained Bayesian model for which the GA is used for feature selection; while, the Bayesian algorithm is used as classifier.","PeriodicalId":245958,"journal":{"name":"International Journal of Informatics and Communication Technology (IJ-ICT)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Memetic algorithm for short messaging service spam filter using text normalization and semantic approach\",\"authors\":\"A. Ojugo, A. Eboka\",\"doi\":\"10.11591/ijict.v9i1.pp9-18\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Today’s popularity of the short messages services (SMS) has created a propitious environment for spamming to thrive. Spams are unsolicited advertising, adult-themed or inappropriate content, premium fraud, smishing and malware. They are a constant reminder of the need for an effective spam filter. However, SMS limitations of 160-charcaters and 140-bytes size as well as its being rippled with slangs, emoticons and abbreviations further inhibits effective training of models to aid accurate classification. The study proposes Genetic Algorithm Trained Bayesian Network solution that seeks to normalize noisy feats, expand text via use of lexicographic and semantic dictionaries that uses word sense disambiguation technique to train the underlying learning heuristics. And in turn, effectively help to classify SMS in spam and legitimate classes. Hybrid model comprises of text preprocessing, feature selection as well as training and classification section. Study uses a hybrid Genetic Algorithm trained Bayesian model for which the GA is used for feature selection; while, the Bayesian algorithm is used as classifier.\",\"PeriodicalId\":245958,\"journal\":{\"name\":\"International Journal of Informatics and Communication Technology (IJ-ICT)\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Informatics and Communication Technology (IJ-ICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.11591/ijict.v9i1.pp9-18\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Informatics and Communication Technology (IJ-ICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11591/ijict.v9i1.pp9-18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

摘要

如今短信服务的普及为垃圾邮件的滋生创造了有利的环境。垃圾邮件是未经请求的广告,成人主题或不适当的内容,收费欺诈,诈骗和恶意软件。它们不断提醒我们需要一个有效的垃圾邮件过滤器。然而,短信的160个字符和140个字节的限制,以及俚语、表情符号和缩写的泛滥,进一步阻碍了模型的有效训练,以帮助准确分类。该研究提出了遗传算法训练的贝叶斯网络解决方案,该解决方案寻求规范化噪声特征,通过使用词典和语义词典来扩展文本,词典和语义词典使用词义消歧技术来训练潜在的学习启发式。反过来,有效地帮助将SMS分类为垃圾邮件和合法类。混合模型包括文本预处理、特征选择以及训练和分类部分。研究采用混合遗传算法训练贝叶斯模型,其中遗传算法用于特征选择;采用贝叶斯算法作为分类器。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Memetic algorithm for short messaging service spam filter using text normalization and semantic approach
Today’s popularity of the short messages services (SMS) has created a propitious environment for spamming to thrive. Spams are unsolicited advertising, adult-themed or inappropriate content, premium fraud, smishing and malware. They are a constant reminder of the need for an effective spam filter. However, SMS limitations of 160-charcaters and 140-bytes size as well as its being rippled with slangs, emoticons and abbreviations further inhibits effective training of models to aid accurate classification. The study proposes Genetic Algorithm Trained Bayesian Network solution that seeks to normalize noisy feats, expand text via use of lexicographic and semantic dictionaries that uses word sense disambiguation technique to train the underlying learning heuristics. And in turn, effectively help to classify SMS in spam and legitimate classes. Hybrid model comprises of text preprocessing, feature selection as well as training and classification section. Study uses a hybrid Genetic Algorithm trained Bayesian model for which the GA is used for feature selection; while, the Bayesian algorithm is used as classifier.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Review-based analysis of clustering approaches in a recommendation system Adaptive resource allocation in NOMA-enabled backscatter communications systems Navigating the cyber forensics landscape a review of recent innovations ChatGPT's effect on the job market: how automation affects employment in sectors using ChatGPT for customer service Predicting anomalies in computer networks using autoencoder-based representation learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1