An automated learning model for sentiment analysis and data classification of Twitter data using balanced CA-SVM

Concurrent Engineering Pub Date : 2021-07-20 DOI:10.1177/1063293X211031485

C. Cyril, J. Beulah, Neelakandan Subramani, Prakash Mohan, A. Harshavardhan, D. Sivabalaselvamani

{"title":"An automated learning model for sentiment analysis and data classification of Twitter data using balanced CA-SVM","authors":"C. Cyril, J. Beulah, Neelakandan Subramani, Prakash Mohan, A. Harshavardhan, D. Sivabalaselvamani","doi":"10.1177/1063293X211031485","DOIUrl":null,"url":null,"abstract":"The modern society runs over the social media for their most time of every day. The web users spend their most time in social media and they share many details with their friends. Such information obtained from their chat has been used in several applications. The sentiment analysis is the one which has been applied with Twitter data set toward identifying the emotion of any user and based on those different problems can be solved. Primarily, the data as of the Twitter database is preprocessed. In this step, tokenization, stemming, stop word removal, and number removal are done. The proposed automated learning with CA-SVM based sentiment analysis model reads the Twitter data set. After that they have been processed to extract the features which yield set of terms. Using the terms, the tweets are clustered using TGS-K means clustering which measures Euclidean distance according to different features like semantic sentiment score (SSS), gazetteer and symbolic sentiment support (GSSS), and topical sentiment score (TSS). Further, the method classifies the tweets according to support vector machine (CA-SVM) which classifies the tweet according to the support value which is measured based on the above two measures. The attained results are validated utilizing k-fold cross-validation methodology. Then, the classification is performed by utilizing the Balanced CA-SVM (Deep Learning Modified Neural Network). The results are evaluated and compared with the existing works. The Proposed model achieved 92.48 % accuracy and 92.05% sentiment score contrasted with the existing works.","PeriodicalId":10680,"journal":{"name":"Concurrent Engineering","volume":"31 1","pages":"386 - 395"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"39","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrent Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/1063293X211031485","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 39

Abstract

The modern society runs over the social media for their most time of every day. The web users spend their most time in social media and they share many details with their friends. Such information obtained from their chat has been used in several applications. The sentiment analysis is the one which has been applied with Twitter data set toward identifying the emotion of any user and based on those different problems can be solved. Primarily, the data as of the Twitter database is preprocessed. In this step, tokenization, stemming, stop word removal, and number removal are done. The proposed automated learning with CA-SVM based sentiment analysis model reads the Twitter data set. After that they have been processed to extract the features which yield set of terms. Using the terms, the tweets are clustered using TGS-K means clustering which measures Euclidean distance according to different features like semantic sentiment score (SSS), gazetteer and symbolic sentiment support (GSSS), and topical sentiment score (TSS). Further, the method classifies the tweets according to support vector machine (CA-SVM) which classifies the tweet according to the support value which is measured based on the above two measures. The attained results are validated utilizing k-fold cross-validation methodology. Then, the classification is performed by utilizing the Balanced CA-SVM (Deep Learning Modified Neural Network). The results are evaluated and compared with the existing works. The Proposed model achieved 92.48 % accuracy and 92.05% sentiment score contrasted with the existing works.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于平衡CA-SVM的Twitter数据情感分析与分类自动学习模型

现代社会每天大部分时间都在使用社交媒体。网络用户在社交媒体上花费的时间最多，他们与朋友分享许多细节。从他们的聊天中获得的这些信息已经在几个应用程序中使用。情感分析是一种应用于Twitter数据集的分析，旨在识别任何用户的情感，并基于这些不同的问题可以解决。首先，对Twitter数据库中的数据进行预处理。在这一步中，完成了标记化、词干提取、停止词删除和数字删除。提出了基于CA-SVM的情感分析模型的自动学习方法。然后对它们进行处理以提取产生术语集的特征。使用这些术语，使用TGS-K聚类方法对tweet进行聚类，该聚类方法根据语义情感评分(SSS)、地名和符号情感支持(GSSS)以及主题情感评分(TSS)等不同特征测量欧几里得距离。进一步，该方法根据支持向量机(CA-SVM)对推文进行分类，支持向量机根据基于上述两个度量测量的支持值对推文进行分类。利用k-fold交叉验证方法验证了所获得的结果。然后，利用平衡CA-SVM (Deep Learning Modified Neural Network)进行分类。对结果进行了评价，并与已有的工作进行了比较。与现有模型相比，该模型的准确率为92.48%，情感得分为92.05%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Concurrent Engineering

自引率

0.00%

发文量