Khaled Balhaf, Omar A. Darwish, Emad Rawashdeh, Mohammad Abu Awad, Dirar A. Darweesh, Yahya M. Tashtoush, Saif Rawashdeh
{"title":"分类阿拉伯海湾的推文以检测人们的趋势:一个案例研究","authors":"Khaled Balhaf, Omar A. Darwish, Emad Rawashdeh, Mohammad Abu Awad, Dirar A. Darweesh, Yahya M. Tashtoush, Saif Rawashdeh","doi":"10.1109/SNAMS58071.2022.10062585","DOIUrl":null,"url":null,"abstract":"Recently, media and business companies are utilizing social media to reach a large set of users to maximize the amount of gained profit. Actually, these companies are looking for the best ways to satisfy their user's requirements. It is very difficult to understand these requirements because of the large set of users on social media like Twitter. For this reason, the goal of our research project is to build a classifier that can detect Arabian trends among Gulf area Twitter users. The new built classifier can assist these companies to deliver the convenient products and media contents like photos and videos according to users' trends. By using our own designed Java-based tool, we have collected a significant dataset of tweets. Also, two experiments of tweet classification have been implemented to compare the effects of balanced and imbalanced training data and to measure the effect of data size on the accuracy of classifiers. In both experiments, Support Vector Machine (SVM), K-Nearest Neighbors (KNN) and Naïve Bayes algorithms are used as classifiers. The first experiment uses small, imbalanced data sets and four classes of data, which are Sport, Politics, Islam and Culture. The Light and Root Stemmers were used with each classifier. The best outcome achieved in our research project by utilizing a Naïve Bayes algorithm with the Light Stemmer technique. It achieved an accuracy reaching 76.27%. In the second experiment, we used a balanced large data set with the same classifiers. In addition, we have added one more class to the new data set which is Economics. The experimental results showed that the best accuracy (81.17%) is obtained by using SVM with the Light Stemmer method. The Light Stemmer achieved the best outcomes for all classifiers since almost all of the tweets were written in dialects.","PeriodicalId":371668,"journal":{"name":"2022 Ninth International Conference on Social Networks Analysis, Management and Security (SNAMS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Classifying Arabian Gulf Tweets to Detect People's Trends: A case study\",\"authors\":\"Khaled Balhaf, Omar A. Darwish, Emad Rawashdeh, Mohammad Abu Awad, Dirar A. Darweesh, Yahya M. Tashtoush, Saif Rawashdeh\",\"doi\":\"10.1109/SNAMS58071.2022.10062585\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, media and business companies are utilizing social media to reach a large set of users to maximize the amount of gained profit. Actually, these companies are looking for the best ways to satisfy their user's requirements. It is very difficult to understand these requirements because of the large set of users on social media like Twitter. For this reason, the goal of our research project is to build a classifier that can detect Arabian trends among Gulf area Twitter users. The new built classifier can assist these companies to deliver the convenient products and media contents like photos and videos according to users' trends. By using our own designed Java-based tool, we have collected a significant dataset of tweets. Also, two experiments of tweet classification have been implemented to compare the effects of balanced and imbalanced training data and to measure the effect of data size on the accuracy of classifiers. In both experiments, Support Vector Machine (SVM), K-Nearest Neighbors (KNN) and Naïve Bayes algorithms are used as classifiers. The first experiment uses small, imbalanced data sets and four classes of data, which are Sport, Politics, Islam and Culture. The Light and Root Stemmers were used with each classifier. The best outcome achieved in our research project by utilizing a Naïve Bayes algorithm with the Light Stemmer technique. It achieved an accuracy reaching 76.27%. In the second experiment, we used a balanced large data set with the same classifiers. In addition, we have added one more class to the new data set which is Economics. The experimental results showed that the best accuracy (81.17%) is obtained by using SVM with the Light Stemmer method. The Light Stemmer achieved the best outcomes for all classifiers since almost all of the tweets were written in dialects.\",\"PeriodicalId\":371668,\"journal\":{\"name\":\"2022 Ninth International Conference on Social Networks Analysis, Management and Security (SNAMS)\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 Ninth International Conference on Social Networks Analysis, Management and Security (SNAMS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SNAMS58071.2022.10062585\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Ninth International Conference on Social Networks Analysis, Management and Security (SNAMS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SNAMS58071.2022.10062585","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Classifying Arabian Gulf Tweets to Detect People's Trends: A case study
Recently, media and business companies are utilizing social media to reach a large set of users to maximize the amount of gained profit. Actually, these companies are looking for the best ways to satisfy their user's requirements. It is very difficult to understand these requirements because of the large set of users on social media like Twitter. For this reason, the goal of our research project is to build a classifier that can detect Arabian trends among Gulf area Twitter users. The new built classifier can assist these companies to deliver the convenient products and media contents like photos and videos according to users' trends. By using our own designed Java-based tool, we have collected a significant dataset of tweets. Also, two experiments of tweet classification have been implemented to compare the effects of balanced and imbalanced training data and to measure the effect of data size on the accuracy of classifiers. In both experiments, Support Vector Machine (SVM), K-Nearest Neighbors (KNN) and Naïve Bayes algorithms are used as classifiers. The first experiment uses small, imbalanced data sets and four classes of data, which are Sport, Politics, Islam and Culture. The Light and Root Stemmers were used with each classifier. The best outcome achieved in our research project by utilizing a Naïve Bayes algorithm with the Light Stemmer technique. It achieved an accuracy reaching 76.27%. In the second experiment, we used a balanced large data set with the same classifiers. In addition, we have added one more class to the new data set which is Economics. The experimental results showed that the best accuracy (81.17%) is obtained by using SVM with the Light Stemmer method. The Light Stemmer achieved the best outcomes for all classifiers since almost all of the tweets were written in dialects.