{"title":"BERT for Twitter Sentiment Analysis: Achieving High Accuracy and Balanced Performance","authors":"Oladri Renuka, Niranchana Radhakrishnan","doi":"10.36548/jtcsst.2024.1.003","DOIUrl":null,"url":null,"abstract":"The Bidirectional Encoder Representations from Transformers (BERT) model is used in this work to analyse sentiment on Twitter data. A Kaggle dataset of manually annotated and anonymized COVID-19-related tweets was used to refine the model. Location, tweet date, original tweet content, and sentiment labels are all included in the dataset. When compared to the Multinomial Naive Bayes (MNB) baseline, BERT's performance was assessed, and it achieved an overall accuracy of 87% on the test set. The results indicated that for negative feelings, the accuracy was 0.93, the recall was 0.84, and the F1-score was 0.88; for neutral sentiments, the precision was 0.86, the recall was 0.78, and the F1-score was 0.82; and for positive sentiments, the precision was 0.82, the recall was 0.94, and the F1-score was 0.88. The model's proficiency with the linguistic nuances of Twitter, including slang and sarcasm, was demonstrated. This study also identifies the flaws of BERT and makes recommendations for future research paths, such as the integration of external knowledge and alternative designs.","PeriodicalId":107574,"journal":{"name":"Journal of Trends in Computer Science and Smart Technology","volume":"162 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Trends in Computer Science and Smart Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.36548/jtcsst.2024.1.003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The Bidirectional Encoder Representations from Transformers (BERT) model is used in this work to analyse sentiment on Twitter data. A Kaggle dataset of manually annotated and anonymized COVID-19-related tweets was used to refine the model. Location, tweet date, original tweet content, and sentiment labels are all included in the dataset. When compared to the Multinomial Naive Bayes (MNB) baseline, BERT's performance was assessed, and it achieved an overall accuracy of 87% on the test set. The results indicated that for negative feelings, the accuracy was 0.93, the recall was 0.84, and the F1-score was 0.88; for neutral sentiments, the precision was 0.86, the recall was 0.78, and the F1-score was 0.82; and for positive sentiments, the precision was 0.82, the recall was 0.94, and the F1-score was 0.88. The model's proficiency with the linguistic nuances of Twitter, including slang and sarcasm, was demonstrated. This study also identifies the flaws of BERT and makes recommendations for future research paths, such as the integration of external knowledge and alternative designs.