使用BERT+NBSVM和地理空间方法进行疫苗情绪分析。

IF 2.5 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Journal of Supercomputing Pub Date : 2023-05-07 DOI:10.1007/s11227-023-05319-8

Areeba Umair, Elio Masciari, Muhammad Habib Ullah

{"title":"使用BERT+NBSVM和地理空间方法进行疫苗情绪分析。","authors":"Areeba Umair, Elio Masciari, Muhammad Habib Ullah","doi":"10.1007/s11227-023-05319-8","DOIUrl":null,"url":null,"abstract":"Since the spread of the coronavirus flu in 2019 (hereafter referred to as COVID-19), millions of people worldwide have been affected by the pandemic, which has significantly impacted our habits in various ways. In order to eradicate the disease, a great help came from unprecedentedly fast vaccines development along with strict preventive measures adoption like lockdown. Thus, world wide provisioning of vaccines was crucial in order to achieve the maximum immunization of population. However, the fast development of vaccines, driven by the urge of limiting the pandemic caused skeptical reactions by a vast amount of population. More specifically, the people's hesitancy in getting vaccinated was an additional obstacle in fighting COVID-19. To ameliorate this scenario, it is important to understand people's sentiments about vaccines in order to take proper actions to better inform the population. As a matter of fact, people continuously update their feelings and sentiments on social media, thus a proper analysis of those opinions is an important challenge for providing proper information to avoid misinformation. More in detail, sentiment analysis (Wankhade et al. in Artif Intell Rev 55(7):5731-5780, 2022. 10.1007/s10462-022-10144-1) is a powerful technique in natural language processing that enables the identification and classification of people feelings (mainly) in text data. It involves the use of machine learning algorithms and other computational techniques to analyze large volumes of text and determine whether they express positive, negative or neutral sentiment. Sentiment analysis is widely used in industries such as marketing, customer service, and healthcare, among others, to gain actionable insights from customer feedback, social media posts, and other forms of unstructured textual data. In this paper, Sentiment Analysis will be used to elaborate on people reaction to COVID-19 vaccines in order to provide useful insights to improve the correct understanding of their correct usage and possible advantages. In this paper, a framework that leverages artificial intelligence (AI) methods is proposed for classifying tweets based on their polarity values. We analyzed Twitter data related to COVID-19 vaccines after the most appropriate pre-processing on them. More specifically, we identified the word-cloud of negative, positive, and neutral words using an artificial intelligence tool to determine the sentiment of tweets. After this pre-processing step, we performed classification using the BERT + NBSVM model to classify people's sentiments about vaccines. The reason for choosing to combine bidirectional encoder representations from transformers (BERT) and Naive Bayes and support vector machine (NBSVM ) can be understood by considering the limitation of BERT-based approaches, which only leverage encoder layers, resulting in lower performance on short texts like the ones used in our analysis. Such a limitation can be ameliorated by using Naive Bayes and Support Vector Machine approaches that are able to achieve higher performance in short text sentiment analysis. Thus, we took advantage of both BERT features and NBSVM features to define a flexible framework for our sentiment analysis goal related to vaccine sentiment identification. Moreover, we enrich our results with spatial analysis of the data by using geo-coding, visualization, and spatial correlation analysis to suggest the most suitable vaccination centers to users based on the sentiment analysis outcomes. In principle, we do not need to implement a distributed architecture to run our experiments as the available public data are not massive. However, we discuss a high-performance architecture that will be used if the collected data scales up dramatically. We compared our approach with the state-of-art methods by comparing most widely used metrics like Accuracy, Precision, Recall and F-measure. The proposed BERT + NBSVM outperformed alternative models by achieving 73% accuracy, 71% precision, 88% recall and 73% F-measure for classification of positive sentiments while 73% accuracy, 71% precision, 74% recall and 73% F-measure for classification of negative sentiments respectively. These promising results will be properly discussed in next sections. The use of artificial intelligence methods and social media analysis can lead to a better understanding of people's reactions and opinions about any trending topic. However, in the case of health-related topics like COVID-19 vaccines, proper sentiment identification could be crucial for implementing public health policies. More in detail, the availability of useful findings on user opinions about vaccines can help policymakers design proper strategies and implement ad-hoc vaccination protocols according to people's feelings, in order to provide better public service. To this end, we leveraged geospatial information to support effective recommendations for vaccination centers.","PeriodicalId":50034,"journal":{"name":"Journal of Supercomputing","volume":" ","pages":"1-31"},"PeriodicalIF":2.5000,"publicationDate":"2023-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10164419/pdf/","citationCount":"0","resultStr":"{\"title\":\"Vaccine sentiment analysis using BERT + NBSVM and geo-spatial approaches.\",\"authors\":\"Areeba Umair, Elio Masciari, Muhammad Habib Ullah\",\"doi\":\"10.1007/s11227-023-05319-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Since the spread of the coronavirus flu in 2019 (hereafter referred to as COVID-19), millions of people worldwide have been affected by the pandemic, which has significantly impacted our habits in various ways. In order to eradicate the disease, a great help came from unprecedentedly fast vaccines development along with strict preventive measures adoption like lockdown. Thus, world wide provisioning of vaccines was crucial in order to achieve the maximum immunization of population. However, the fast development of vaccines, driven by the urge of limiting the pandemic caused skeptical reactions by a vast amount of population. More specifically, the people's hesitancy in getting vaccinated was an additional obstacle in fighting COVID-19. To ameliorate this scenario, it is important to understand people's sentiments about vaccines in order to take proper actions to better inform the population. As a matter of fact, people continuously update their feelings and sentiments on social media, thus a proper analysis of those opinions is an important challenge for providing proper information to avoid misinformation. More in detail, sentiment analysis (Wankhade et al. in Artif Intell Rev 55(7):5731-5780, 2022. 10.1007/s10462-022-10144-1) is a powerful technique in natural language processing that enables the identification and classification of people feelings (mainly) in text data. It involves the use of machine learning algorithms and other computational techniques to analyze large volumes of text and determine whether they express positive, negative or neutral sentiment. Sentiment analysis is widely used in industries such as marketing, customer service, and healthcare, among others, to gain actionable insights from customer feedback, social media posts, and other forms of unstructured textual data. In this paper, Sentiment Analysis will be used to elaborate on people reaction to COVID-19 vaccines in order to provide useful insights to improve the correct understanding of their correct usage and possible advantages. In this paper, a framework that leverages artificial intelligence (AI) methods is proposed for classifying tweets based on their polarity values. We analyzed Twitter data related to COVID-19 vaccines after the most appropriate pre-processing on them. More specifically, we identified the word-cloud of negative, positive, and neutral words using an artificial intelligence tool to determine the sentiment of tweets. After this pre-processing step, we performed classification using the BERT + NBSVM model to classify people's sentiments about vaccines. The reason for choosing to combine bidirectional encoder representations from transformers (BERT) and Naive Bayes and support vector machine (NBSVM ) can be understood by considering the limitation of BERT-based approaches, which only leverage encoder layers, resulting in lower performance on short texts like the ones used in our analysis. Such a limitation can be ameliorated by using Naive Bayes and Support Vector Machine approaches that are able to achieve higher performance in short text sentiment analysis. Thus, we took advantage of both BERT features and NBSVM features to define a flexible framework for our sentiment analysis goal related to vaccine sentiment identification. Moreover, we enrich our results with spatial analysis of the data by using geo-coding, visualization, and spatial correlation analysis to suggest the most suitable vaccination centers to users based on the sentiment analysis outcomes. In principle, we do not need to implement a distributed architecture to run our experiments as the available public data are not massive. However, we discuss a high-performance architecture that will be used if the collected data scales up dramatically. We compared our approach with the state-of-art methods by comparing most widely used metrics like Accuracy, Precision, Recall and F-measure. The proposed BERT + NBSVM outperformed alternative models by achieving 73% accuracy, 71% precision, 88% recall and 73% F-measure for classification of positive sentiments while 73% accuracy, 71% precision, 74% recall and 73% F-measure for classification of negative sentiments respectively. These promising results will be properly discussed in next sections. The use of artificial intelligence methods and social media analysis can lead to a better understanding of people's reactions and opinions about any trending topic. However, in the case of health-related topics like COVID-19 vaccines, proper sentiment identification could be crucial for implementing public health policies. More in detail, the availability of useful findings on user opinions about vaccines can help policymakers design proper strategies and implement ad-hoc vaccination protocols according to people's feelings, in order to provide better public service. To this end, we leveraged geospatial information to support effective recommendations for vaccination centers.\",\"PeriodicalId\":50034,\"journal\":{\"name\":\"Journal of Supercomputing\",\"volume\":\" \",\"pages\":\"1-31\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2023-05-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10164419/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Supercomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s11227-023-05319-8\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Supercomputing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11227-023-05319-8","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

自2019年冠状病毒流感（以下简称新冠肺炎）传播以来，全球数百万人受到疫情的影响，这对我们的生活习惯产生了各种重大影响。为了根除这种疾病，空前快速的疫苗开发以及封锁等严格的预防措施给了我们很大的帮助。因此，在全世界范围内提供疫苗对于实现最大限度的人口免疫至关重要。然而，在限制疫情的冲动推动下，疫苗的快速开发引起了大量民众的怀疑反应。更具体地说，人们对接种疫苗的犹豫是抗击新冠肺炎的另一个障碍。为了改善这种情况，重要的是了解人们对疫苗的看法，以便采取适当行动，更好地向民众提供信息。事实上，人们不断在社交媒体上更新自己的感受和情绪，因此，对这些观点进行适当的分析是提供适当信息以避免错误信息的一个重要挑战。更详细地说，情绪分析（Wankhade等人，见Artif Intell Rev 55（7）：5731-57802022。10.1007/10462-022-10144-1）是自然语言处理中的一种强大技术，它能够（主要）识别和分类文本数据中的人的感受。它涉及使用机器学习算法和其他计算技术来分析大量文本，并确定它们是否表达了积极、消极或中立的情绪。情绪分析广泛用于营销、客户服务和医疗保健等行业，以从客户反馈、社交媒体帖子和其他形式的非结构化文本数据中获得可操作的见解。在本文中，情绪分析将用于阐述人们对新冠肺炎疫苗的反应，以便提供有用的见解，以提高对其正确使用和可能优势的正确理解。在本文中，提出了一个利用人工智能（AI）方法根据推文的极性值对推文进行分类的框架。在对新冠肺炎疫苗进行最适当的预处理后，我们分析了与之相关的推特数据。更具体地说，我们使用人工智能工具来确定推文的情绪，从而确定了负面、正面和中性词的词云。在这个预处理步骤之后，我们使用BERT+NBSVM模型进行分类，以对人们对疫苗的情绪进行分类。选择将来自Transformer（BERT）和Naive Bayes的双向编码器表示与支持向量机（NBSVM）相结合的原因可以通过考虑基于BERT的方法的局限性来理解，这些方法只利用编码器层，导致在短文本上的性能较低，如我们分析中使用的方法。这种限制可以通过使用朴素贝叶斯和支持向量机方法来改善，这些方法能够在短文本情感分析中实现更高的性能。因此，我们利用BERT特征和NBSVM特征为我们与疫苗情绪识别相关的情绪分析目标定义了一个灵活的框架。此外，我们通过使用地理编码、可视化和空间相关性分析对数据进行空间分析，以根据情绪分析结果向用户建议最合适的疫苗接种中心，从而丰富我们的结果。原则上，我们不需要实现分布式架构来运行我们的实验，因为可用的公共数据并不庞大。然而，我们讨论了一种高性能体系结构，如果收集的数据大幅扩展，将使用该体系结构。我们通过比较最广泛使用的指标，如准确性、精密度、召回率和F-measure，将我们的方法与最先进的方法进行了比较。所提出的BERT+NBSVM在积极情绪分类方面的准确率为73%，准确率为71%，召回率为88%，F-测度为73%，而在消极情绪分类方面分别达到73%，准确度为71%，回收率为74%和F-测度，优于其他模型。这些有希望的结果将在下一节中适当讨论。使用人工智能方法和社交媒体分析可以更好地了解人们对任何热门话题的反应和看法。然而，就新冠肺炎疫苗等与健康相关的话题而言，正确的情绪识别可能对实施公共卫生政策至关重要。更详细地说，用户对疫苗的意见的有用发现可以帮助决策者根据人们的感受设计适当的策略和实施特别的疫苗接种协议，以提供更好的公共服务。为此，我们利用地理空间信息支持疫苗接种中心的有效建议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Vaccine sentiment analysis using BERT + NBSVM and geo-spatial approaches.

Since the spread of the coronavirus flu in 2019 (hereafter referred to as COVID-19), millions of people worldwide have been affected by the pandemic, which has significantly impacted our habits in various ways. In order to eradicate the disease, a great help came from unprecedentedly fast vaccines development along with strict preventive measures adoption like lockdown. Thus, world wide provisioning of vaccines was crucial in order to achieve the maximum immunization of population. However, the fast development of vaccines, driven by the urge of limiting the pandemic caused skeptical reactions by a vast amount of population. More specifically, the people's hesitancy in getting vaccinated was an additional obstacle in fighting COVID-19. To ameliorate this scenario, it is important to understand people's sentiments about vaccines in order to take proper actions to better inform the population. As a matter of fact, people continuously update their feelings and sentiments on social media, thus a proper analysis of those opinions is an important challenge for providing proper information to avoid misinformation. More in detail, sentiment analysis (Wankhade et al. in Artif Intell Rev 55(7):5731-5780, 2022. 10.1007/s10462-022-10144-1) is a powerful technique in natural language processing that enables the identification and classification of people feelings (mainly) in text data. It involves the use of machine learning algorithms and other computational techniques to analyze large volumes of text and determine whether they express positive, negative or neutral sentiment. Sentiment analysis is widely used in industries such as marketing, customer service, and healthcare, among others, to gain actionable insights from customer feedback, social media posts, and other forms of unstructured textual data. In this paper, Sentiment Analysis will be used to elaborate on people reaction to COVID-19 vaccines in order to provide useful insights to improve the correct understanding of their correct usage and possible advantages. In this paper, a framework that leverages artificial intelligence (AI) methods is proposed for classifying tweets based on their polarity values. We analyzed Twitter data related to COVID-19 vaccines after the most appropriate pre-processing on them. More specifically, we identified the word-cloud of negative, positive, and neutral words using an artificial intelligence tool to determine the sentiment of tweets. After this pre-processing step, we performed classification using the BERT + NBSVM model to classify people's sentiments about vaccines. The reason for choosing to combine bidirectional encoder representations from transformers (BERT) and Naive Bayes and support vector machine (NBSVM ) can be understood by considering the limitation of BERT-based approaches, which only leverage encoder layers, resulting in lower performance on short texts like the ones used in our analysis. Such a limitation can be ameliorated by using Naive Bayes and Support Vector Machine approaches that are able to achieve higher performance in short text sentiment analysis. Thus, we took advantage of both BERT features and NBSVM features to define a flexible framework for our sentiment analysis goal related to vaccine sentiment identification. Moreover, we enrich our results with spatial analysis of the data by using geo-coding, visualization, and spatial correlation analysis to suggest the most suitable vaccination centers to users based on the sentiment analysis outcomes. In principle, we do not need to implement a distributed architecture to run our experiments as the available public data are not massive. However, we discuss a high-performance architecture that will be used if the collected data scales up dramatically. We compared our approach with the state-of-art methods by comparing most widely used metrics like Accuracy, Precision, Recall and F-measure. The proposed BERT + NBSVM outperformed alternative models by achieving 73% accuracy, 71% precision, 88% recall and 73% F-measure for classification of positive sentiments while 73% accuracy, 71% precision, 74% recall and 73% F-measure for classification of negative sentiments respectively. These promising results will be properly discussed in next sections. The use of artificial intelligence methods and social media analysis can lead to a better understanding of people's reactions and opinions about any trending topic. However, in the case of health-related topics like COVID-19 vaccines, proper sentiment identification could be crucial for implementing public health policies. More in detail, the availability of useful findings on user opinions about vaccines can help policymakers design proper strategies and implement ad-hoc vaccination protocols according to people's feelings, in order to provide better public service. To this end, we leveraged geospatial information to support effective recommendations for vaccination centers.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Supercomputing 工程技术-工程：电子与电气

CiteScore

6.30

自引率

12.10%

发文量

734

审稿时长

13 months

期刊介绍： The Journal of Supercomputing publishes papers on the technology, architecture and systems, algorithms, languages and programs, performance measures and methods, and applications of all aspects of Supercomputing. Tutorial and survey papers are intended for workers and students in the fields associated with and employing advanced computer systems. The journal also publishes letters to the editor, especially in areas relating to policy, succinct statements of paradoxes, intuitively puzzling results, partial results and real needs. Published theoretical and practical papers are advanced, in-depth treatments describing new developments and new ideas. Each includes an introduction summarizing prior, directly pertinent work that is useful for the reader to understand, in order to appreciate the advances being described.