Salsabila Mazya Permataning Tyas, R. Sarno, Agus Tri Haryono, Kelly Rossa Sungkono
{"title":"基于随机过采样的稳健优化BERT分析不平衡股票新闻情绪数据","authors":"Salsabila Mazya Permataning Tyas, R. Sarno, Agus Tri Haryono, Kelly Rossa Sungkono","doi":"10.1109/ICCoSITE57641.2023.10127725","DOIUrl":null,"url":null,"abstract":"Stock news is one of the information sources that can used to monitor stock prices. The information from stock news usually contains positive and negative sentiments that can affect stock prices. Therefore, sentiment analysis is needed to process the sentiment of stock news. The stock news dataset is taken from Kaggle. From these data, there is an imbalanced class between positive and negative sentiment. This research proposed a method to solve the imbalance dataset with random oversampling which worked by randomly replicating several minority classes. This research presents several scenarios of pre-processing text with different stages, intending to get high accuracy. The classification method used in this paper is a robustly optimized Bidirectional Transformer Encoder Representation (RoBERTa). Besides that, this paper also compared with baseline of Machine Learning (ML) such as Multinomial Naïve Bayes, Bernoulli Naïve Bayes, Support Vector Machine, Random Forest Classifier, Logistic Regression and used two different text representation such as TF-IDF and Word2Vec. The best result in this research is obtained using RoBERTa method with the fourth scenario of pre-processing text, in which the stage of pre-processing in this scenario only removing hashtag, without removing punctuation, removing the number, converting number, stop word removal, and lemmatization. The performance result is 0.85 precision, 0,84 recall, 0,84 F1-score, and 86% for accuracy result.","PeriodicalId":256184,"journal":{"name":"2023 International Conference on Computer Science, Information Technology and Engineering (ICCoSITE)","volume":"159 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Robustly Optimized BERT using Random Oversampling for Analyzing Imbalanced Stock News Sentiment Data\",\"authors\":\"Salsabila Mazya Permataning Tyas, R. Sarno, Agus Tri Haryono, Kelly Rossa Sungkono\",\"doi\":\"10.1109/ICCoSITE57641.2023.10127725\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Stock news is one of the information sources that can used to monitor stock prices. The information from stock news usually contains positive and negative sentiments that can affect stock prices. Therefore, sentiment analysis is needed to process the sentiment of stock news. The stock news dataset is taken from Kaggle. From these data, there is an imbalanced class between positive and negative sentiment. This research proposed a method to solve the imbalance dataset with random oversampling which worked by randomly replicating several minority classes. This research presents several scenarios of pre-processing text with different stages, intending to get high accuracy. The classification method used in this paper is a robustly optimized Bidirectional Transformer Encoder Representation (RoBERTa). Besides that, this paper also compared with baseline of Machine Learning (ML) such as Multinomial Naïve Bayes, Bernoulli Naïve Bayes, Support Vector Machine, Random Forest Classifier, Logistic Regression and used two different text representation such as TF-IDF and Word2Vec. The best result in this research is obtained using RoBERTa method with the fourth scenario of pre-processing text, in which the stage of pre-processing in this scenario only removing hashtag, without removing punctuation, removing the number, converting number, stop word removal, and lemmatization. The performance result is 0.85 precision, 0,84 recall, 0,84 F1-score, and 86% for accuracy result.\",\"PeriodicalId\":256184,\"journal\":{\"name\":\"2023 International Conference on Computer Science, Information Technology and Engineering (ICCoSITE)\",\"volume\":\"159 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 International Conference on Computer Science, Information Technology and Engineering (ICCoSITE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCoSITE57641.2023.10127725\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Computer Science, Information Technology and Engineering (ICCoSITE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCoSITE57641.2023.10127725","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Robustly Optimized BERT using Random Oversampling for Analyzing Imbalanced Stock News Sentiment Data
Stock news is one of the information sources that can used to monitor stock prices. The information from stock news usually contains positive and negative sentiments that can affect stock prices. Therefore, sentiment analysis is needed to process the sentiment of stock news. The stock news dataset is taken from Kaggle. From these data, there is an imbalanced class between positive and negative sentiment. This research proposed a method to solve the imbalance dataset with random oversampling which worked by randomly replicating several minority classes. This research presents several scenarios of pre-processing text with different stages, intending to get high accuracy. The classification method used in this paper is a robustly optimized Bidirectional Transformer Encoder Representation (RoBERTa). Besides that, this paper also compared with baseline of Machine Learning (ML) such as Multinomial Naïve Bayes, Bernoulli Naïve Bayes, Support Vector Machine, Random Forest Classifier, Logistic Regression and used two different text representation such as TF-IDF and Word2Vec. The best result in this research is obtained using RoBERTa method with the fourth scenario of pre-processing text, in which the stage of pre-processing in this scenario only removing hashtag, without removing punctuation, removing the number, converting number, stop word removal, and lemmatization. The performance result is 0.85 precision, 0,84 recall, 0,84 F1-score, and 86% for accuracy result.