{"title":"Vaccine sentiment analysis using BERT + NBSVM and geo-spatial approaches.","authors":"Areeba Umair, Elio Masciari, Muhammad Habib Ullah","doi":"10.1007/s11227-023-05319-8","DOIUrl":null,"url":null,"abstract":"<p><p>Since the spread of the coronavirus flu in 2019 (hereafter referred to as COVID-19), millions of people worldwide have been affected by the pandemic, which has significantly impacted our habits in various ways. In order to eradicate the disease, a great help came from unprecedentedly fast vaccines development along with strict preventive measures adoption like lockdown. Thus, world wide provisioning of vaccines was crucial in order to achieve the maximum immunization of population. However, the fast development of vaccines, driven by the urge of limiting the pandemic caused skeptical reactions by a vast amount of population. More specifically, the people's hesitancy in getting vaccinated was an additional obstacle in fighting COVID-19. To ameliorate this scenario, it is important to understand people's sentiments about vaccines in order to take proper actions to better inform the population. As a matter of fact, people continuously update their feelings and sentiments on social media, thus a proper analysis of those opinions is an important challenge for providing proper information to avoid misinformation. More in detail, sentiment analysis (Wankhade et al. in Artif Intell Rev 55(7):5731-5780, 2022. 10.1007/s10462-022-10144-1) is a powerful technique in natural language processing that enables the identification and classification of people feelings (mainly) in text data. It involves the use of machine learning algorithms and other computational techniques to analyze large volumes of text and determine whether they express positive, negative or neutral sentiment. Sentiment analysis is widely used in industries such as marketing, customer service, and healthcare, among others, to gain actionable insights from customer feedback, social media posts, and other forms of unstructured textual data. In this paper, Sentiment Analysis will be used to elaborate on people reaction to COVID-19 vaccines in order to provide useful insights to improve the correct understanding of their correct usage and possible advantages. In this paper, a framework that leverages artificial intelligence (AI) methods is proposed for classifying tweets based on their polarity values. We analyzed Twitter data related to COVID-19 vaccines after the most appropriate pre-processing on them. More specifically, we identified the word-cloud of negative, positive, and neutral words using an artificial intelligence tool to determine the sentiment of tweets. After this pre-processing step, we performed classification using the BERT + NBSVM model to classify people's sentiments about vaccines. The reason for choosing to combine bidirectional encoder representations from transformers (BERT) and Naive Bayes and support vector machine (NBSVM ) can be understood by considering the limitation of BERT-based approaches, which only leverage encoder layers, resulting in lower performance on short texts like the ones used in our analysis. Such a limitation can be ameliorated by using Naive Bayes and Support Vector Machine approaches that are able to achieve higher performance in short text sentiment analysis. Thus, we took advantage of both BERT features and NBSVM features to define a flexible framework for our sentiment analysis goal related to vaccine sentiment identification. Moreover, we enrich our results with spatial analysis of the data by using geo-coding, visualization, and spatial correlation analysis to suggest the most suitable vaccination centers to users based on the sentiment analysis outcomes. In principle, we do not need to implement a distributed architecture to run our experiments as the available public data are not massive. However, we discuss a high-performance architecture that will be used if the collected data scales up dramatically. We compared our approach with the state-of-art methods by comparing most widely used metrics like Accuracy, Precision, Recall and <i>F</i>-measure. The proposed BERT + NBSVM outperformed alternative models by achieving 73% accuracy, 71% precision, 88% recall and 73% <i>F</i>-measure for classification of positive sentiments while 73% accuracy, 71% precision, 74% recall and 73% <i>F</i>-measure for classification of negative sentiments respectively. These promising results will be properly discussed in next sections. The use of artificial intelligence methods and social media analysis can lead to a better understanding of people's reactions and opinions about any trending topic. However, in the case of health-related topics like COVID-19 vaccines, proper sentiment identification could be crucial for implementing public health policies. More in detail, the availability of useful findings on user opinions about vaccines can help policymakers design proper strategies and implement ad-hoc vaccination protocols according to people's feelings, in order to provide better public service. To this end, we leveraged geospatial information to support effective recommendations for vaccination centers.</p>","PeriodicalId":50034,"journal":{"name":"Journal of Supercomputing","volume":" ","pages":"1-31"},"PeriodicalIF":2.5000,"publicationDate":"2023-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10164419/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Supercomputing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11227-023-05319-8","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Since the spread of the coronavirus flu in 2019 (hereafter referred to as COVID-19), millions of people worldwide have been affected by the pandemic, which has significantly impacted our habits in various ways. In order to eradicate the disease, a great help came from unprecedentedly fast vaccines development along with strict preventive measures adoption like lockdown. Thus, world wide provisioning of vaccines was crucial in order to achieve the maximum immunization of population. However, the fast development of vaccines, driven by the urge of limiting the pandemic caused skeptical reactions by a vast amount of population. More specifically, the people's hesitancy in getting vaccinated was an additional obstacle in fighting COVID-19. To ameliorate this scenario, it is important to understand people's sentiments about vaccines in order to take proper actions to better inform the population. As a matter of fact, people continuously update their feelings and sentiments on social media, thus a proper analysis of those opinions is an important challenge for providing proper information to avoid misinformation. More in detail, sentiment analysis (Wankhade et al. in Artif Intell Rev 55(7):5731-5780, 2022. 10.1007/s10462-022-10144-1) is a powerful technique in natural language processing that enables the identification and classification of people feelings (mainly) in text data. It involves the use of machine learning algorithms and other computational techniques to analyze large volumes of text and determine whether they express positive, negative or neutral sentiment. Sentiment analysis is widely used in industries such as marketing, customer service, and healthcare, among others, to gain actionable insights from customer feedback, social media posts, and other forms of unstructured textual data. In this paper, Sentiment Analysis will be used to elaborate on people reaction to COVID-19 vaccines in order to provide useful insights to improve the correct understanding of their correct usage and possible advantages. In this paper, a framework that leverages artificial intelligence (AI) methods is proposed for classifying tweets based on their polarity values. We analyzed Twitter data related to COVID-19 vaccines after the most appropriate pre-processing on them. More specifically, we identified the word-cloud of negative, positive, and neutral words using an artificial intelligence tool to determine the sentiment of tweets. After this pre-processing step, we performed classification using the BERT + NBSVM model to classify people's sentiments about vaccines. The reason for choosing to combine bidirectional encoder representations from transformers (BERT) and Naive Bayes and support vector machine (NBSVM ) can be understood by considering the limitation of BERT-based approaches, which only leverage encoder layers, resulting in lower performance on short texts like the ones used in our analysis. Such a limitation can be ameliorated by using Naive Bayes and Support Vector Machine approaches that are able to achieve higher performance in short text sentiment analysis. Thus, we took advantage of both BERT features and NBSVM features to define a flexible framework for our sentiment analysis goal related to vaccine sentiment identification. Moreover, we enrich our results with spatial analysis of the data by using geo-coding, visualization, and spatial correlation analysis to suggest the most suitable vaccination centers to users based on the sentiment analysis outcomes. In principle, we do not need to implement a distributed architecture to run our experiments as the available public data are not massive. However, we discuss a high-performance architecture that will be used if the collected data scales up dramatically. We compared our approach with the state-of-art methods by comparing most widely used metrics like Accuracy, Precision, Recall and F-measure. The proposed BERT + NBSVM outperformed alternative models by achieving 73% accuracy, 71% precision, 88% recall and 73% F-measure for classification of positive sentiments while 73% accuracy, 71% precision, 74% recall and 73% F-measure for classification of negative sentiments respectively. These promising results will be properly discussed in next sections. The use of artificial intelligence methods and social media analysis can lead to a better understanding of people's reactions and opinions about any trending topic. However, in the case of health-related topics like COVID-19 vaccines, proper sentiment identification could be crucial for implementing public health policies. More in detail, the availability of useful findings on user opinions about vaccines can help policymakers design proper strategies and implement ad-hoc vaccination protocols according to people's feelings, in order to provide better public service. To this end, we leveraged geospatial information to support effective recommendations for vaccination centers.
期刊介绍:
The Journal of Supercomputing publishes papers on the technology, architecture and systems, algorithms, languages and programs, performance measures and methods, and applications of all aspects of Supercomputing. Tutorial and survey papers are intended for workers and students in the fields associated with and employing advanced computer systems. The journal also publishes letters to the editor, especially in areas relating to policy, succinct statements of paradoxes, intuitively puzzling results, partial results and real needs.
Published theoretical and practical papers are advanced, in-depth treatments describing new developments and new ideas. Each includes an introduction summarizing prior, directly pertinent work that is useful for the reader to understand, in order to appreciate the advances being described.