K. Harsha, S. Yuva Nitya, Sravani Kota, K. Satyanarayana, Jaya Lakshmi
{"title":"Empirical evaluation of Amazon fine food reviews using Text Mining","authors":"K. Harsha, S. Yuva Nitya, Sravani Kota, K. Satyanarayana, Jaya Lakshmi","doi":"10.1109/I2CT57861.2023.10126349","DOIUrl":null,"url":null,"abstract":"Approximately 1.6 million individuals use the e-commerce website “amazon” to buy things from a variety of categories, including food. Reviewing products by consumers who have already purchased them is beneficial to those who are considering doing so, however reviews can be either positive or negative. The buyer finds it difficult to read through such many evaluations before making a purchase, but machine learning ideas and training models make it possible. Our objective is to categorize the reviews based on the attributes that are present in the dataset in order to address issues like these. Redundancy is present in data when it is presented to us in its raw form. So, since evaluations with a score of 3 are regarded as impartial, we delete them along with redundancy. After that, we use the NLP tool kit (a column in the data set) to preprocess the text by removing any stop words (such as in, as, is, on, and punctuation), and we lowercase each letter. The suggested approach renders the text into machine-understandable language using word embedding techniques. Text processing is necessary because customer reviews written in language that is understood by humans cannot be read by machines. The data must be in a machine-readable language in order to apply any classification technique. We separate the data into train and test set after the preprocessing is complete. After the training is complete, we use this model on a test set of data to determine its accuracy. Next, we utilize classification methods like logistic regression and XG Boost to see how accurate our model is. This study’s conclusion involves using the model we developed to predict the review based on previous reviews. In this project, we build a model, feed it with existing reviews, apply it to upcoming reviews, and then forecast if the product is good or not. For this work we have taken the data set from Kaggle.","PeriodicalId":150346,"journal":{"name":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/I2CT57861.2023.10126349","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Approximately 1.6 million individuals use the e-commerce website “amazon” to buy things from a variety of categories, including food. Reviewing products by consumers who have already purchased them is beneficial to those who are considering doing so, however reviews can be either positive or negative. The buyer finds it difficult to read through such many evaluations before making a purchase, but machine learning ideas and training models make it possible. Our objective is to categorize the reviews based on the attributes that are present in the dataset in order to address issues like these. Redundancy is present in data when it is presented to us in its raw form. So, since evaluations with a score of 3 are regarded as impartial, we delete them along with redundancy. After that, we use the NLP tool kit (a column in the data set) to preprocess the text by removing any stop words (such as in, as, is, on, and punctuation), and we lowercase each letter. The suggested approach renders the text into machine-understandable language using word embedding techniques. Text processing is necessary because customer reviews written in language that is understood by humans cannot be read by machines. The data must be in a machine-readable language in order to apply any classification technique. We separate the data into train and test set after the preprocessing is complete. After the training is complete, we use this model on a test set of data to determine its accuracy. Next, we utilize classification methods like logistic regression and XG Boost to see how accurate our model is. This study’s conclusion involves using the model we developed to predict the review based on previous reviews. In this project, we build a model, feed it with existing reviews, apply it to upcoming reviews, and then forecast if the product is good or not. For this work we have taken the data set from Kaggle.