R. Duwairi, Mosab Alfaqeh, Mohammad Wardat, Areen Alrabadi
{"title":"Sentiment analysis for Arabizi text","authors":"R. Duwairi, Mosab Alfaqeh, Mohammad Wardat, Areen Alrabadi","doi":"10.1109/IACS.2016.7476098","DOIUrl":null,"url":null,"abstract":"This paper has used supervised learning to assign sentiment or polarity labels to tweets written in Arabizi. Arabizi is a form of writing Arabic text which relies on using Latin letters rather than Arabic letters. This form of writing is common with the Arab youth. A rule-based converter was designed and applied on the tweets to convert them from Arabizi to Arabic. Subsequently, the resultant tweets were annotated with their respective sentiment labels using crowdsourcing. This ArabiziDataset consists of 3206 tweets. Results obtained by this work reveal that SVM accuracies are higher than Naive Bayes accuracies. Secondly, removal of stopwords and mapping emoticons to their corresponding words did not greatly improve the accuracies for Arabizi data. Thirdly, eliminating neutral tweets at early stage in the classification improves Precision for both Naive Bayes and SVM. However, Recall values fluctuated, sometimes they got improved; on other times they did not improve.","PeriodicalId":6579,"journal":{"name":"2016 7th International Conference on Information and Communication Systems (ICICS)","volume":"105 1","pages":"127-132"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 7th International Conference on Information and Communication Systems (ICICS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IACS.2016.7476098","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 29
Abstract
This paper has used supervised learning to assign sentiment or polarity labels to tweets written in Arabizi. Arabizi is a form of writing Arabic text which relies on using Latin letters rather than Arabic letters. This form of writing is common with the Arab youth. A rule-based converter was designed and applied on the tweets to convert them from Arabizi to Arabic. Subsequently, the resultant tweets were annotated with their respective sentiment labels using crowdsourcing. This ArabiziDataset consists of 3206 tweets. Results obtained by this work reveal that SVM accuracies are higher than Naive Bayes accuracies. Secondly, removal of stopwords and mapping emoticons to their corresponding words did not greatly improve the accuracies for Arabizi data. Thirdly, eliminating neutral tweets at early stage in the classification improves Precision for both Naive Bayes and SVM. However, Recall values fluctuated, sometimes they got improved; on other times they did not improve.