Sentimental Analysis based on hybrid approach of Latent Dirichlet Allocation and Machine Learning for Large-Scale of Imbalanced Twitter Data

Nasir Jamal, Xianqiao Chen, Junaid Hussain Abro, Doniyor Tukhtakhunov
{"title":"Sentimental Analysis based on hybrid approach of Latent Dirichlet Allocation and Machine Learning for Large-Scale of Imbalanced Twitter Data","authors":"Nasir Jamal, Xianqiao Chen, Junaid Hussain Abro, Doniyor Tukhtakhunov","doi":"10.1145/3446132.3446413","DOIUrl":null,"url":null,"abstract":"Emotions classification in large amount of Twitter's data is very effective to analyze the users’ mood about a concerned product, news, topic, and so on. However, it is really a challenging task to extract meaningful features from a burst of raw tweets as emotions are subjective with limited fuzzy boundaries. These subjective features can be expressed in different terminologies and perceptions. In this paper, we proposed a hybrid approach of LDA and machine learning to predict emotions for large scale of imbalanced tweets. First, the raw tweets are preprocessed using tokenization method for capturing useful features without noisy information. Second, the local and global feature's importance is estimated by applying TFIDF statistical technique. Third, the Latent Dirichlet Allocation (LDA) topic modeling method is used to extract topics from these features. These topics explain concepts of related tweet which is really helpful for classification. Fourth, the Adaptive Synthetic (ADASYN) class balancing technique is applied to oversample the data and balance each class of topic. Finally, the K-Nearest Neighbor (KNN) machine learning algorithm is applied to predict the emotions in extracted topics. The class balancing method increase the significance of minor classes and solve the problem of class imbalance. The proposed approach is evaluated on two different Twitters’ emotions datasets. It is proved that, this methodology outperformed as compared to the popular state of the art methods in terms of precision, recall, f-measure and classification accuracy.","PeriodicalId":125388,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence","volume":"134 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3446132.3446413","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Emotions classification in large amount of Twitter's data is very effective to analyze the users’ mood about a concerned product, news, topic, and so on. However, it is really a challenging task to extract meaningful features from a burst of raw tweets as emotions are subjective with limited fuzzy boundaries. These subjective features can be expressed in different terminologies and perceptions. In this paper, we proposed a hybrid approach of LDA and machine learning to predict emotions for large scale of imbalanced tweets. First, the raw tweets are preprocessed using tokenization method for capturing useful features without noisy information. Second, the local and global feature's importance is estimated by applying TFIDF statistical technique. Third, the Latent Dirichlet Allocation (LDA) topic modeling method is used to extract topics from these features. These topics explain concepts of related tweet which is really helpful for classification. Fourth, the Adaptive Synthetic (ADASYN) class balancing technique is applied to oversample the data and balance each class of topic. Finally, the K-Nearest Neighbor (KNN) machine learning algorithm is applied to predict the emotions in extracted topics. The class balancing method increase the significance of minor classes and solve the problem of class imbalance. The proposed approach is evaluated on two different Twitters’ emotions datasets. It is proved that, this methodology outperformed as compared to the popular state of the art methods in terms of precision, recall, f-measure and classification accuracy.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于潜在Dirichlet分配和机器学习混合方法的大规模不平衡Twitter数据情感分析
在大量的Twitter数据中进行情绪分类,对于分析用户对关注的产品、新闻、话题等的情绪是非常有效的。然而,从大量原始tweet中提取有意义的特征确实是一项具有挑战性的任务,因为情绪是主观的,具有有限的模糊界限。这些主观特征可以用不同的术语和感知来表达。在本文中,我们提出了一种LDA和机器学习的混合方法来预测大规模不平衡推文的情绪。首先,使用标记化方法对原始tweet进行预处理,以捕获无噪声信息的有用特征。其次,利用TFIDF统计技术估计局部和全局特征的重要性。第三,利用潜狄利克雷分配(Latent Dirichlet Allocation, LDA)主题建模方法从这些特征中提取主题。这些主题解释了相关tweet的概念,这对分类非常有帮助。第四,采用自适应合成(ADASYN)类平衡技术对数据进行过采样,平衡各类主题。最后,应用k -最近邻(KNN)机器学习算法对提取的主题进行情绪预测。班级平衡法提高了辅修班级的重要性,解决了班级失衡问题。该方法在两个不同的twitter情绪数据集上进行了评估。事实证明,该方法在精度,召回率,f-measure和分类精度方面优于流行的最新方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Lane Detection Combining Details and Integrity: an Advanced Method for Lane Detection The Cat's Eye Effect Target Recognition Method Based on deep convolutional neural network Leveraging Different Context for Response Generation through Topic-guided Multi-head Attention Siamese Multiplicative LSTM for Semantic Text Similarity Multi-constrained Vehicle Routing Problem Solution based on Adaptive Genetic Algorithm
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1