Pub Date : 2022-06-07DOI: 10.17977/um018v5i12022p101-108
Naveed Sultan
Today, everything is sold online, and many individuals can post reviews about different products to show feedback. Serves as feedback for businesses regarding buyer reviews, performance, product quality, and seller service. The project focuses on buyer opinions based on Mobile Phone reviews. Sentiment analysis is the function of analyzing all these data, obtaining opinions about these products and services that classify them as positive, negative, or neutral. This insight can help companies improve their products and help potential buyers make the right decisions. Once the preprocessing is classified on a trained dataset, these reviews must be preprocessed to remove unwanted data such as stop words, verbs, pos tagging, punctuation, and attachments. Many techniques are present to perform such tasks, but in this article, we will use a model that will use different inspection machine techniques.
{"title":"Sentiment Analysis of Amazon Product Reviews using Supervised Machine Learning Techniques","authors":"Naveed Sultan","doi":"10.17977/um018v5i12022p101-108","DOIUrl":"https://doi.org/10.17977/um018v5i12022p101-108","url":null,"abstract":"Today, everything is sold online, and many individuals can post reviews about different products to show feedback. Serves as feedback for businesses regarding buyer reviews, performance, product quality, and seller service. The project focuses on buyer opinions based on Mobile Phone reviews. Sentiment analysis is the function of analyzing all these data, obtaining opinions about these products and services that classify them as positive, negative, or neutral. This insight can help companies improve their products and help potential buyers make the right decisions. Once the preprocessing is classified on a trained dataset, these reviews must be preprocessed to remove unwanted data such as stop words, verbs, pos tagging, punctuation, and attachments. Many techniques are present to perform such tasks, but in this article, we will use a model that will use different inspection machine techniques.","PeriodicalId":52868,"journal":{"name":"Knowledge Engineering and Data Science","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67523422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-07DOI: 10.17977/um018v5i12022p87-100
U. Pujianto, Muhammad Iqbal Akbar, Niendhitta Tamia Lassela, D. Sutaji
An imbalanced class on a dataset is a common classification problem. The effect of using imbalanced class datasets can cause a decrease in the performance of the classifier. Resampling is one of the solutions to this problem. This study used 100 datasets from 3 websites: UCI Machine Learning, Kaggle, and OpenML. Each dataset will go through 3 processing stages: the resampling process, the classification process, and the significance testing process between performance evaluation values of the combination of classifier and the resampling using paired t-test. The resampling used in the process is Random Undersampling, Random Oversampling, and SMOTE. The classifier used in the classification process is Naïve Bayes Classifier, Decision Tree, and Neural Network. The classification results in accuracy, precision, recall, and f-measure values are tested using paired t-tests to determine the significance of the classifier's performance from datasets that were not resampled and those that had applied the resampling. The paired t-test is also used to find a combination between the classifier and the resampling that gives significant results. This study obtained two results. The first result is that resampling on imbalanced class datasets can substantially affect the classifier's performance more than the classifier's performance from datasets that are not applied the resampling technique. The second result is that combining the Neural Network Algorithm without the resampling provides significance based on the accuracy value. Combining the Neural Network Algorithm with the SMOTE technique provides significant performance based on the amount of precision, recall, and f-measure.
{"title":"The Effect of Resampling on Classifier Performance: an Empirical Study","authors":"U. Pujianto, Muhammad Iqbal Akbar, Niendhitta Tamia Lassela, D. Sutaji","doi":"10.17977/um018v5i12022p87-100","DOIUrl":"https://doi.org/10.17977/um018v5i12022p87-100","url":null,"abstract":"An imbalanced class on a dataset is a common classification problem. The effect of using imbalanced class datasets can cause a decrease in the performance of the classifier. Resampling is one of the solutions to this problem. This study used 100 datasets from 3 websites: UCI Machine Learning, Kaggle, and OpenML. Each dataset will go through 3 processing stages: the resampling process, the classification process, and the significance testing process between performance evaluation values of the combination of classifier and the resampling using paired t-test. The resampling used in the process is Random Undersampling, Random Oversampling, and SMOTE. The classifier used in the classification process is Naïve Bayes Classifier, Decision Tree, and Neural Network. The classification results in accuracy, precision, recall, and f-measure values are tested using paired t-tests to determine the significance of the classifier's performance from datasets that were not resampled and those that had applied the resampling. The paired t-test is also used to find a combination between the classifier and the resampling that gives significant results. This study obtained two results. The first result is that resampling on imbalanced class datasets can substantially affect the classifier's performance more than the classifier's performance from datasets that are not applied the resampling technique. The second result is that combining the Neural Network Algorithm without the resampling provides significance based on the accuracy value. Combining the Neural Network Algorithm with the SMOTE technique provides significant performance based on the amount of precision, recall, and f-measure.","PeriodicalId":52868,"journal":{"name":"Knowledge Engineering and Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46476639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}