Unsupervised Twitter Sentiment Analysis on The Revision of Indonesian Code Law and the Anti-Corruption Law using Combination Method of Opinion Word and Agglomerative Hierarchical Clustering
Nur Restu Prayoga, Tresna Maulana Fahrudin, Made Kamisutara, Angga Rahagiyanto, T. Alfath, Latipah, Slamet Winardi, Kunto Eko Susilo
{"title":"Unsupervised Twitter Sentiment Analysis on The Revision of Indonesian Code Law and the Anti-Corruption Law using Combination Method of Opinion Word and Agglomerative Hierarchical Clustering","authors":"Nur Restu Prayoga, Tresna Maulana Fahrudin, Made Kamisutara, Angga Rahagiyanto, T. Alfath, Latipah, Slamet Winardi, Kunto Eko Susilo","doi":"10.24003/emitter.v8i1.477","DOIUrl":null,"url":null,"abstract":"The rejection on ratification of the revision of Indonesian Code Law or known as RKUHP and Corruption Law raises several opinions from various perspectives in social media. Twitter as one of many platforms affected, has more than 19.5 million users in Indonesia. Twitter is one of many social media in Indonesia where people can share their views, arguments, information, and opinions from all points of view. Since Twitter has a great diversity of users, it needs a system which is designed to determine the opinion tendency towards the problems or objects. The purpose of this study is to analyze the sentiment of Twitter users' tweets to reject the revision of the Law whether they have positive or negative sentiments using the Agglomerative Hierarchical Clustering method. The data that being used in this study were obtained from the results of crawling tweets based on hashtag (#) (#ReformasiDikorupsi). The next stage is pre-processing which consists of case folding, tokenizing, cleansing, sanitizing, and stemming. The extraction features Opinion words and Term Frequency (TF) which performs the process automatically. In the clustering stage, two clusters use three approaches; single linkage, complete linkage and average linkage. In the accuracy calculation phase, the writer uses the error ratio, confusion matrix, and silhouette coefficient. Therefore, the results are quite good. From 2408 tweets, the highest accuracy results are 61.6%.","PeriodicalId":40905,"journal":{"name":"EMITTER-International Journal of Engineering Technology","volume":"33 1","pages":""},"PeriodicalIF":0.4000,"publicationDate":"2020-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"EMITTER-International Journal of Engineering Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24003/emitter.v8i1.477","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 3
Abstract
The rejection on ratification of the revision of Indonesian Code Law or known as RKUHP and Corruption Law raises several opinions from various perspectives in social media. Twitter as one of many platforms affected, has more than 19.5 million users in Indonesia. Twitter is one of many social media in Indonesia where people can share their views, arguments, information, and opinions from all points of view. Since Twitter has a great diversity of users, it needs a system which is designed to determine the opinion tendency towards the problems or objects. The purpose of this study is to analyze the sentiment of Twitter users' tweets to reject the revision of the Law whether they have positive or negative sentiments using the Agglomerative Hierarchical Clustering method. The data that being used in this study were obtained from the results of crawling tweets based on hashtag (#) (#ReformasiDikorupsi). The next stage is pre-processing which consists of case folding, tokenizing, cleansing, sanitizing, and stemming. The extraction features Opinion words and Term Frequency (TF) which performs the process automatically. In the clustering stage, two clusters use three approaches; single linkage, complete linkage and average linkage. In the accuracy calculation phase, the writer uses the error ratio, confusion matrix, and silhouette coefficient. Therefore, the results are quite good. From 2408 tweets, the highest accuracy results are 61.6%.