Khalid Shifullah, H. M. Rakibullah, Nuzhat Islam, Hasin Raihan, Md. Ashik Iqbal, Dewan Ziaul Karim, Annajiat Alim Rasel
{"title":"Classification of Hotel Reviews Using Sentiment Analysis and Machine Learning","authors":"Khalid Shifullah, H. M. Rakibullah, Nuzhat Islam, Hasin Raihan, Md. Ashik Iqbal, Dewan Ziaul Karim, Annajiat Alim Rasel","doi":"10.1109/ICCIT57492.2022.10054884","DOIUrl":null,"url":null,"abstract":"Social media has become an essential part for people all over the world. It has given a platform for people to share thoughts, emotions, opinions, and ideas, causing a huge deal of data upsurge. Such an amount of data could be analyzed based on sentiment analysis and text classification via construction of an effective machine learning model. The concept gets more insight into it through analysis of the data, which is nearly impossible to conduct manually due to its huge configuration. This research focuses on the user’s comments, and reviews about different hotels to predict their sentiment. As for the datasets, comments and reviews of hotels from online sites have been utilized. Moreover, text pre-processing techniques like tokenization, case folding, stopword removal, lemmatization, and duplicate data removal have been applied. TF-IDF and Bag of Words have been applied for word embedding. Furthermore, the effectiveness of supervised machine learning algorithms like, Support Vector Machine, Naïve Bayes, Random Forest, and Logistic Regression was evaluated and from the comparative analysis, it was observed that the Logistic Regression provided the most accuracy ranging from 86 to 89 percent.","PeriodicalId":255498,"journal":{"name":"2022 25th International Conference on Computer and Information Technology (ICCIT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 25th International Conference on Computer and Information Technology (ICCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIT57492.2022.10054884","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Social media has become an essential part for people all over the world. It has given a platform for people to share thoughts, emotions, opinions, and ideas, causing a huge deal of data upsurge. Such an amount of data could be analyzed based on sentiment analysis and text classification via construction of an effective machine learning model. The concept gets more insight into it through analysis of the data, which is nearly impossible to conduct manually due to its huge configuration. This research focuses on the user’s comments, and reviews about different hotels to predict their sentiment. As for the datasets, comments and reviews of hotels from online sites have been utilized. Moreover, text pre-processing techniques like tokenization, case folding, stopword removal, lemmatization, and duplicate data removal have been applied. TF-IDF and Bag of Words have been applied for word embedding. Furthermore, the effectiveness of supervised machine learning algorithms like, Support Vector Machine, Naïve Bayes, Random Forest, and Logistic Regression was evaluated and from the comparative analysis, it was observed that the Logistic Regression provided the most accuracy ranging from 86 to 89 percent.
社交媒体已经成为世界各地人们不可或缺的一部分。它为人们提供了一个分享思想、情感、观点和想法的平台,引起了巨大的数据热潮。通过构建有效的机器学习模型,可以基于情感分析和文本分类对如此大量的数据进行分析。这个概念通过对数据的分析得到了更深入的了解,由于其庞大的配置,这几乎是不可能手动进行的。本研究的重点是用户的评论,以及对不同酒店的评论,以预测他们的情绪。对于数据集,我们利用了在线网站对酒店的评论和评论。此外,还应用了文本预处理技术,如标记化、案例折叠、停止词删除、词序化和重复数据删除。应用TF-IDF和Bag of Words进行词嵌入。此外,评估了监督机器学习算法(如支持向量机,Naïve贝叶斯,随机森林和逻辑回归)的有效性,并从比较分析中观察到逻辑回归提供了最高的准确性,范围从86%到89%。