{"title":"Sentiment Analysis On YouTube Comments Using Word2Vec and Random Forest","authors":"S. Khomsah","doi":"10.31315/TELEMATIKA.V18I1.4493","DOIUrl":null,"url":null,"abstract":"Purpose: This study aims to determine the accuracy of sentiment classification using the Random-Forest, and Word2Vec Skip-gram used for features extraction. Word2Vec is one of the effective methods that represent aspects of word meaning and, it helps to improve sentiment classification accuracy.Methodology: The research data consists of 31947 comments downloaded from the YouTube channel for the 2019 presidential election debate. The dataset consists of 23612 positive comments and 8335 negative comments. To avoid bias, we balance the amount of positive and negative data using oversampling. We use Skip-gram to extract features word. The Skip-gram will produce several features around the word the context (input word). Each of these features contains a weight. The feature weight of each comment is calculated by an average-based approach. Random Forest is used to building a sentiment classification model. Experiments were carried out several times with different epoch and window parameters. The performance of each model experiment was measured by cross-validation.Result: Experiments using epochs 1, 5, and 20 and window sizes of 3, 5, and 10, obtain the average accuracy of the model is 90.1% to 91%. However, the results of testing reach an accuracy between 88.77% and 89.05%. But accuracy of the model little bit lower than the accuracy model also was not significant. In the next experiment, it recommended using the number of epochs and the window size greater than twenty epochs and ten windows, so that accuracy increasing significantly.Value: The number of epoch and window sizes on the Skip-Gram affect accuracy. More and more epoch and window sizes affect increasing the accuracy.","PeriodicalId":31716,"journal":{"name":"Telematika","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Telematika","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31315/TELEMATIKA.V18I1.4493","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Purpose: This study aims to determine the accuracy of sentiment classification using the Random-Forest, and Word2Vec Skip-gram used for features extraction. Word2Vec is one of the effective methods that represent aspects of word meaning and, it helps to improve sentiment classification accuracy.Methodology: The research data consists of 31947 comments downloaded from the YouTube channel for the 2019 presidential election debate. The dataset consists of 23612 positive comments and 8335 negative comments. To avoid bias, we balance the amount of positive and negative data using oversampling. We use Skip-gram to extract features word. The Skip-gram will produce several features around the word the context (input word). Each of these features contains a weight. The feature weight of each comment is calculated by an average-based approach. Random Forest is used to building a sentiment classification model. Experiments were carried out several times with different epoch and window parameters. The performance of each model experiment was measured by cross-validation.Result: Experiments using epochs 1, 5, and 20 and window sizes of 3, 5, and 10, obtain the average accuracy of the model is 90.1% to 91%. However, the results of testing reach an accuracy between 88.77% and 89.05%. But accuracy of the model little bit lower than the accuracy model also was not significant. In the next experiment, it recommended using the number of epochs and the window size greater than twenty epochs and ten windows, so that accuracy increasing significantly.Value: The number of epoch and window sizes on the Skip-Gram affect accuracy. More and more epoch and window sizes affect increasing the accuracy.