R. Duwairi, Mosab Alfaqeh, Mohammad Wardat, Areen Alrabadi
{"title":"阿拉伯语文本的情感分析","authors":"R. Duwairi, Mosab Alfaqeh, Mohammad Wardat, Areen Alrabadi","doi":"10.1109/IACS.2016.7476098","DOIUrl":null,"url":null,"abstract":"This paper has used supervised learning to assign sentiment or polarity labels to tweets written in Arabizi. Arabizi is a form of writing Arabic text which relies on using Latin letters rather than Arabic letters. This form of writing is common with the Arab youth. A rule-based converter was designed and applied on the tweets to convert them from Arabizi to Arabic. Subsequently, the resultant tweets were annotated with their respective sentiment labels using crowdsourcing. This ArabiziDataset consists of 3206 tweets. Results obtained by this work reveal that SVM accuracies are higher than Naive Bayes accuracies. Secondly, removal of stopwords and mapping emoticons to their corresponding words did not greatly improve the accuracies for Arabizi data. Thirdly, eliminating neutral tweets at early stage in the classification improves Precision for both Naive Bayes and SVM. However, Recall values fluctuated, sometimes they got improved; on other times they did not improve.","PeriodicalId":6579,"journal":{"name":"2016 7th International Conference on Information and Communication Systems (ICICS)","volume":"105 1","pages":"127-132"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":"{\"title\":\"Sentiment analysis for Arabizi text\",\"authors\":\"R. Duwairi, Mosab Alfaqeh, Mohammad Wardat, Areen Alrabadi\",\"doi\":\"10.1109/IACS.2016.7476098\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper has used supervised learning to assign sentiment or polarity labels to tweets written in Arabizi. Arabizi is a form of writing Arabic text which relies on using Latin letters rather than Arabic letters. This form of writing is common with the Arab youth. A rule-based converter was designed and applied on the tweets to convert them from Arabizi to Arabic. Subsequently, the resultant tweets were annotated with their respective sentiment labels using crowdsourcing. This ArabiziDataset consists of 3206 tweets. Results obtained by this work reveal that SVM accuracies are higher than Naive Bayes accuracies. Secondly, removal of stopwords and mapping emoticons to their corresponding words did not greatly improve the accuracies for Arabizi data. Thirdly, eliminating neutral tweets at early stage in the classification improves Precision for both Naive Bayes and SVM. However, Recall values fluctuated, sometimes they got improved; on other times they did not improve.\",\"PeriodicalId\":6579,\"journal\":{\"name\":\"2016 7th International Conference on Information and Communication Systems (ICICS)\",\"volume\":\"105 1\",\"pages\":\"127-132\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-04-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"29\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 7th International Conference on Information and Communication Systems (ICICS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IACS.2016.7476098\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 7th International Conference on Information and Communication Systems (ICICS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IACS.2016.7476098","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
This paper has used supervised learning to assign sentiment or polarity labels to tweets written in Arabizi. Arabizi is a form of writing Arabic text which relies on using Latin letters rather than Arabic letters. This form of writing is common with the Arab youth. A rule-based converter was designed and applied on the tweets to convert them from Arabizi to Arabic. Subsequently, the resultant tweets were annotated with their respective sentiment labels using crowdsourcing. This ArabiziDataset consists of 3206 tweets. Results obtained by this work reveal that SVM accuracies are higher than Naive Bayes accuracies. Secondly, removal of stopwords and mapping emoticons to their corresponding words did not greatly improve the accuracies for Arabizi data. Thirdly, eliminating neutral tweets at early stage in the classification improves Precision for both Naive Bayes and SVM. However, Recall values fluctuated, sometimes they got improved; on other times they did not improve.