{"title":"Sentimental Analysis based on hybrid approach of Latent Dirichlet Allocation and Machine Learning for Large-Scale of Imbalanced Twitter Data","authors":"Nasir Jamal, Xianqiao Chen, Junaid Hussain Abro, Doniyor Tukhtakhunov","doi":"10.1145/3446132.3446413","DOIUrl":null,"url":null,"abstract":"Emotions classification in large amount of Twitter's data is very effective to analyze the users’ mood about a concerned product, news, topic, and so on. However, it is really a challenging task to extract meaningful features from a burst of raw tweets as emotions are subjective with limited fuzzy boundaries. These subjective features can be expressed in different terminologies and perceptions. In this paper, we proposed a hybrid approach of LDA and machine learning to predict emotions for large scale of imbalanced tweets. First, the raw tweets are preprocessed using tokenization method for capturing useful features without noisy information. Second, the local and global feature's importance is estimated by applying TFIDF statistical technique. Third, the Latent Dirichlet Allocation (LDA) topic modeling method is used to extract topics from these features. These topics explain concepts of related tweet which is really helpful for classification. Fourth, the Adaptive Synthetic (ADASYN) class balancing technique is applied to oversample the data and balance each class of topic. Finally, the K-Nearest Neighbor (KNN) machine learning algorithm is applied to predict the emotions in extracted topics. The class balancing method increase the significance of minor classes and solve the problem of class imbalance. The proposed approach is evaluated on two different Twitters’ emotions datasets. It is proved that, this methodology outperformed as compared to the popular state of the art methods in terms of precision, recall, f-measure and classification accuracy.","PeriodicalId":125388,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence","volume":"134 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3446132.3446413","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Emotions classification in large amount of Twitter's data is very effective to analyze the users’ mood about a concerned product, news, topic, and so on. However, it is really a challenging task to extract meaningful features from a burst of raw tweets as emotions are subjective with limited fuzzy boundaries. These subjective features can be expressed in different terminologies and perceptions. In this paper, we proposed a hybrid approach of LDA and machine learning to predict emotions for large scale of imbalanced tweets. First, the raw tweets are preprocessed using tokenization method for capturing useful features without noisy information. Second, the local and global feature's importance is estimated by applying TFIDF statistical technique. Third, the Latent Dirichlet Allocation (LDA) topic modeling method is used to extract topics from these features. These topics explain concepts of related tweet which is really helpful for classification. Fourth, the Adaptive Synthetic (ADASYN) class balancing technique is applied to oversample the data and balance each class of topic. Finally, the K-Nearest Neighbor (KNN) machine learning algorithm is applied to predict the emotions in extracted topics. The class balancing method increase the significance of minor classes and solve the problem of class imbalance. The proposed approach is evaluated on two different Twitters’ emotions datasets. It is proved that, this methodology outperformed as compared to the popular state of the art methods in terms of precision, recall, f-measure and classification accuracy.