{"title":"Empirical Analysis of Supervised and Unsupervised Machine Learning Algorithms with Aspect-Based Sentiment Analysis","authors":"Satwinder Singh, Harpreet Kaur, Rubal Kanozia, Gurpreet Kaur","doi":"10.2478/acss-2023-0012","DOIUrl":null,"url":null,"abstract":"Abstract Machine learning based sentiment analysis is an interdisciplinary approach in opinion mining, particularly in the field of media and communication research. In spite of their different backgrounds, researchers have collaborated to test, train and again retest the machine learning approach to collect, analyse and withdraw a meaningful insight from large datasets. This research classifies the texts of micro-blog (tweets) into positive and negative responses about a particular phenomenon. The study also demonstrates the process of compilation of corpus for review of sentiments, cleaning the body of text to make it a meaningful text, find people’s emotions about it, and interpret the findings. Till date the public sentiment after abrogation of Article 370 has not been studied, which adds the novelty to this scientific study. This study includes the dataset collection from Twitter that comprises 66.7 % of positive tweets and 34.3 % of negative tweets of the people about the abrogation of Article 370. Experimental testing reveals that the proposed methodology is much more effective than the previously proposed methodology. This study focuses on comparison of unsupervised lexicon-based models (TextBlob, AFINN, Vader Sentiment) and supervised machine learning models (KNN, SVM, Random Forest and Naïve Bayes) for sentiment analysis. This is the first study with cyber public opinion over the abrogation of Article 370. Twitter data of more than 2 lakh tweets were collected by the authors. After cleaning, 29732 tweets were selected for analysis. As per the results among supervised learning, Random Forest performs the best, whereas among unsupervised learning TextBlob achieves the highest accuracy of 99 % and 88 %, respectively. Performance parameters of the proposed supervised machine learning models also surpass the result of the recent study performed in 2023 for sentiment analysis.","PeriodicalId":41960,"journal":{"name":"Applied Computer Systems","volume":"139 1","pages":"125 - 136"},"PeriodicalIF":0.5000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/acss-2023-0012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Abstract Machine learning based sentiment analysis is an interdisciplinary approach in opinion mining, particularly in the field of media and communication research. In spite of their different backgrounds, researchers have collaborated to test, train and again retest the machine learning approach to collect, analyse and withdraw a meaningful insight from large datasets. This research classifies the texts of micro-blog (tweets) into positive and negative responses about a particular phenomenon. The study also demonstrates the process of compilation of corpus for review of sentiments, cleaning the body of text to make it a meaningful text, find people’s emotions about it, and interpret the findings. Till date the public sentiment after abrogation of Article 370 has not been studied, which adds the novelty to this scientific study. This study includes the dataset collection from Twitter that comprises 66.7 % of positive tweets and 34.3 % of negative tweets of the people about the abrogation of Article 370. Experimental testing reveals that the proposed methodology is much more effective than the previously proposed methodology. This study focuses on comparison of unsupervised lexicon-based models (TextBlob, AFINN, Vader Sentiment) and supervised machine learning models (KNN, SVM, Random Forest and Naïve Bayes) for sentiment analysis. This is the first study with cyber public opinion over the abrogation of Article 370. Twitter data of more than 2 lakh tweets were collected by the authors. After cleaning, 29732 tweets were selected for analysis. As per the results among supervised learning, Random Forest performs the best, whereas among unsupervised learning TextBlob achieves the highest accuracy of 99 % and 88 %, respectively. Performance parameters of the proposed supervised machine learning models also surpass the result of the recent study performed in 2023 for sentiment analysis.