{"title":"Pengaruh Parameter Word2Vec terhadap Performa Deep Learning pada Klasifikasi Sentimen","authors":"Dwi Intan Af’idah, Dairoh Dairoh, Sharfina Febbi Handayani, Riszki Wijayatun Pratiwi","doi":"10.30591/JPIT.V6I3.3016","DOIUrl":null,"url":null,"abstract":"The difficulty of sentiment classification on this big data can be overcome using deep learning. Before the deep learning training and testing process is carried out, a word features extraction process is needed. Word2Vec as a word features extraction is often used in sentiment classification pre-training because it can capture the semantic meaning of the text by representing a similar vector for each word that has a close meaning. Word2Vec has three parameters that affect the model learning process namely architecture, evaluation method, and dimensions. This study aims to determine the effect of each Word2Vec parameter on deep learning performance in sentiment classification. The accuracy results of the deep learning model were evaluated to determine the effect of the Word2Vec parameter. The results of this study indicate that the three Word2Vec parameters have an influence on the performance of the deep learning model in sentiment classification. The combination of Word2Vec parameters that produces the highest average accuracy include CBOW (Continuous Bag of Word) architecture, Hierarchical Softmax evaluation method, and a dimension of 100. CBOW produces better performance, because it has slightly better accuracy for words that often appear and in this research dataset there are many words that often appear. Hierarchical Softmax shows better results because it uses a binary tree model which makes words that occur rarely will inherit the vector representation above them. The dimension with a value of 100 produces better accuracy because it is in line with the number of datasets of 10,000 reviews. ","PeriodicalId":53375,"journal":{"name":"Jurnal Informatika Jurnal Pengembangan IT","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Informatika Jurnal Pengembangan IT","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30591/JPIT.V6I3.3016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Pengaruh Parameter Word2Vec terhadap Performa Deep Learning pada Klasifikasi Sentimen
The difficulty of sentiment classification on this big data can be overcome using deep learning. Before the deep learning training and testing process is carried out, a word features extraction process is needed. Word2Vec as a word features extraction is often used in sentiment classification pre-training because it can capture the semantic meaning of the text by representing a similar vector for each word that has a close meaning. Word2Vec has three parameters that affect the model learning process namely architecture, evaluation method, and dimensions. This study aims to determine the effect of each Word2Vec parameter on deep learning performance in sentiment classification. The accuracy results of the deep learning model were evaluated to determine the effect of the Word2Vec parameter. The results of this study indicate that the three Word2Vec parameters have an influence on the performance of the deep learning model in sentiment classification. The combination of Word2Vec parameters that produces the highest average accuracy include CBOW (Continuous Bag of Word) architecture, Hierarchical Softmax evaluation method, and a dimension of 100. CBOW produces better performance, because it has slightly better accuracy for words that often appear and in this research dataset there are many words that often appear. Hierarchical Softmax shows better results because it uses a binary tree model which makes words that occur rarely will inherit the vector representation above them. The dimension with a value of 100 produces better accuracy because it is in line with the number of datasets of 10,000 reviews.