{"title":"Machine Learning based Sentiment Analysis of Hindi Data with TF-IDF and Count Vectorization","authors":"Ashwani Gupta, U. Sharma","doi":"10.1109/ICCCS55188.2022.10079323","DOIUrl":null,"url":null,"abstract":"Sentiment refers to emotions. Sentiment analysis, often known as opinion mining, is the technique of identifying and extracting subjective data from pre-web and post-web reviews using text analytics, computational linguistics, and natural language processing. Hindi is an Indian language which is used by many of Indians. Due to phenomenal growth of online product reviews in Hindi post-web Hindi reviews are also increasing rapidly. A machine learning based method in this paper to analysis postweb text data. The present method is divided into four steps. First of all, an annotated Hindi review data set is developed from post-web sources. In second step, feature extraction is performed on annotated Hindi review dataset using the Term-Frequency/ Inverse-Document Frequency (TF-IDF) and count vectorization techniques. In the third step, the retrieved features are given to the classifier so it can make predictions. Moreover, annotated dataset translated into English. Second step and third step are performed on annotated English dataset in last step. A range of evaluation criteria, including precision, recall, and F1- score, are presented in the results. In both instances, the support vector machine produced the most pertinent results.","PeriodicalId":149615,"journal":{"name":"2022 7th International Conference on Computing, Communication and Security (ICCCS)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 7th International Conference on Computing, Communication and Security (ICCCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCS55188.2022.10079323","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Sentiment refers to emotions. Sentiment analysis, often known as opinion mining, is the technique of identifying and extracting subjective data from pre-web and post-web reviews using text analytics, computational linguistics, and natural language processing. Hindi is an Indian language which is used by many of Indians. Due to phenomenal growth of online product reviews in Hindi post-web Hindi reviews are also increasing rapidly. A machine learning based method in this paper to analysis postweb text data. The present method is divided into four steps. First of all, an annotated Hindi review data set is developed from post-web sources. In second step, feature extraction is performed on annotated Hindi review dataset using the Term-Frequency/ Inverse-Document Frequency (TF-IDF) and count vectorization techniques. In the third step, the retrieved features are given to the classifier so it can make predictions. Moreover, annotated dataset translated into English. Second step and third step are performed on annotated English dataset in last step. A range of evaluation criteria, including precision, recall, and F1- score, are presented in the results. In both instances, the support vector machine produced the most pertinent results.