{"title":"对双语罗马乌尔都短信垃圾邮件过滤研究的贡献","authors":"K. Mehmood, H. Afzal, A. Majeed, Hassan Latif","doi":"10.1109/NSEC.2015.7396343","DOIUrl":null,"url":null,"abstract":"With the increased usage of internet and mobile phones, number of spams has also increased in both these areas. The Spam in both these areas is an increasing threat and sometimes cause huge financial as well as data/confidentiality loss. Therefore, actions need to be taken to stop these spams on both media. This paper analyses various techniques that are currently being used in Spam filtering in the context of mobile text messages. The contents of SMS are unique in nature so some techniques might be effective while some might not be. Some of mostly used algorithms and techniques are discussed in this paper. Furthermore, we have performed automatic spam filtering using machine learning algorithms on Roman Urdu text messages and achieved an accuracy of 92.2% on a manually curated corpus of 8449 messages. The SMS corpus has also been made available for future research works.","PeriodicalId":113822,"journal":{"name":"2015 National Software Engineering Conference (NSEC)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Contributions to the study of bi-lingual Roman Urdu SMS spam filtering\",\"authors\":\"K. Mehmood, H. Afzal, A. Majeed, Hassan Latif\",\"doi\":\"10.1109/NSEC.2015.7396343\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the increased usage of internet and mobile phones, number of spams has also increased in both these areas. The Spam in both these areas is an increasing threat and sometimes cause huge financial as well as data/confidentiality loss. Therefore, actions need to be taken to stop these spams on both media. This paper analyses various techniques that are currently being used in Spam filtering in the context of mobile text messages. The contents of SMS are unique in nature so some techniques might be effective while some might not be. Some of mostly used algorithms and techniques are discussed in this paper. Furthermore, we have performed automatic spam filtering using machine learning algorithms on Roman Urdu text messages and achieved an accuracy of 92.2% on a manually curated corpus of 8449 messages. The SMS corpus has also been made available for future research works.\",\"PeriodicalId\":113822,\"journal\":{\"name\":\"2015 National Software Engineering Conference (NSEC)\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 National Software Engineering Conference (NSEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NSEC.2015.7396343\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 National Software Engineering Conference (NSEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NSEC.2015.7396343","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Contributions to the study of bi-lingual Roman Urdu SMS spam filtering
With the increased usage of internet and mobile phones, number of spams has also increased in both these areas. The Spam in both these areas is an increasing threat and sometimes cause huge financial as well as data/confidentiality loss. Therefore, actions need to be taken to stop these spams on both media. This paper analyses various techniques that are currently being used in Spam filtering in the context of mobile text messages. The contents of SMS are unique in nature so some techniques might be effective while some might not be. Some of mostly used algorithms and techniques are discussed in this paper. Furthermore, we have performed automatic spam filtering using machine learning algorithms on Roman Urdu text messages and achieved an accuracy of 92.2% on a manually curated corpus of 8449 messages. The SMS corpus has also been made available for future research works.