Y. Argyris, Nan Zhang, Bidhan Bashyal, Pang-Ning Tan
{"title":"Using Deep Learning to Identify Linguistic Features that Facilitate or Inhibit the Propagation of Anti- and Pro-Vaccine Content on Social Media","authors":"Y. Argyris, Nan Zhang, Bidhan Bashyal, Pang-Ning Tan","doi":"10.1109/ICDH55609.2022.00025","DOIUrl":null,"url":null,"abstract":"Anti-vaccine content is rapidly propagated via social media, fostering vaccine hesitancy, while pro-vaccine content has not replicated the opponent's successes. Despite this dis-parity in the dissemination of anti- and pro-vaccine posts, linguistic features that facilitate or inhibit the propagation of vaccine-related content remain less known. Moreover, most prior machine-learning algorithms classified social-media posts into binary categories (e.g., misinformation or not) and have rarely tackled a higher-order classification task based on divergent perspectives about vaccines (e.g., anti-vaccine, pro-vaccine, and neutral). Our objectives are (1) to identify sets of linguistic features that facilitate and inhibit the propagation of vaccine-related content and (2) to compare whether anti-vaccine, pro-vaccine, and neutral tweets contain either set more frequently than the others. To achieve these goals, we collected a large set of social media posts (over 120 million tweets) between Nov. 15 and Dec. 15, 2021, coinciding with the Omicron variant surge. A two-stage framework was developed using a fine-tuned BERT classifier, demonstrating over 99 and 80 percent accuracy for binary and ternary classification. Finally, the Linguistic Inquiry Word Count text analysis tool was used to count linguistic features in each classified tweet. Our regression results show that anti-vaccine tweets are propagated (i.e., retweeted), while pro-vaccine tweets garner passive endorsements (i.e., favorited). Our results also yielded the two sets of linguistic features as facilitators and inhibitors of the propagation of vaccine-related tweets. Finally, our regression results show that anti-vaccine tweets tend to use the facilitators, while pro-vaccine counterparts employ the inhibitors. These findings and algorithms from this study will aid public health officials' efforts to counteract vaccine misinformation, thereby facilitating the delivery of preventive measures during pandemics and epidemics.","PeriodicalId":120923,"journal":{"name":"2022 IEEE International Conference on Digital Health (ICDH)","volume":"133 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Digital Health (ICDH)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDH55609.2022.00025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Anti-vaccine content is rapidly propagated via social media, fostering vaccine hesitancy, while pro-vaccine content has not replicated the opponent's successes. Despite this dis-parity in the dissemination of anti- and pro-vaccine posts, linguistic features that facilitate or inhibit the propagation of vaccine-related content remain less known. Moreover, most prior machine-learning algorithms classified social-media posts into binary categories (e.g., misinformation or not) and have rarely tackled a higher-order classification task based on divergent perspectives about vaccines (e.g., anti-vaccine, pro-vaccine, and neutral). Our objectives are (1) to identify sets of linguistic features that facilitate and inhibit the propagation of vaccine-related content and (2) to compare whether anti-vaccine, pro-vaccine, and neutral tweets contain either set more frequently than the others. To achieve these goals, we collected a large set of social media posts (over 120 million tweets) between Nov. 15 and Dec. 15, 2021, coinciding with the Omicron variant surge. A two-stage framework was developed using a fine-tuned BERT classifier, demonstrating over 99 and 80 percent accuracy for binary and ternary classification. Finally, the Linguistic Inquiry Word Count text analysis tool was used to count linguistic features in each classified tweet. Our regression results show that anti-vaccine tweets are propagated (i.e., retweeted), while pro-vaccine tweets garner passive endorsements (i.e., favorited). Our results also yielded the two sets of linguistic features as facilitators and inhibitors of the propagation of vaccine-related tweets. Finally, our regression results show that anti-vaccine tweets tend to use the facilitators, while pro-vaccine counterparts employ the inhibitors. These findings and algorithms from this study will aid public health officials' efforts to counteract vaccine misinformation, thereby facilitating the delivery of preventive measures during pandemics and epidemics.
反疫苗的内容通过社交媒体迅速传播,助长了对疫苗的犹豫,而支持疫苗的内容并没有复制对手的成功。尽管在反疫苗和支持疫苗的帖子传播方面存在这种差异,但促进或抑制疫苗相关内容传播的语言特征仍然鲜为人知。此外,大多数先前的机器学习算法将社交媒体帖子分为二元类别(例如,错误信息或非错误信息),并且很少处理基于对疫苗的不同观点(例如,反疫苗,支持疫苗和中立)的高阶分类任务。我们的目标是:(1)识别促进和抑制疫苗相关内容传播的语言特征集;(2)比较反疫苗、支持疫苗和中立推文中哪一组的使用频率高于其他推文。为了实现这些目标,我们在2021年11月15日至12月15日期间收集了大量社交媒体帖子(超过1.2亿条推文),与Omicron变体激增相吻合。使用微调的BERT分类器开发了一个两阶段框架,对二进制和三元分类显示了超过99%和80%的准确率。最后,使用Linguistic Inquiry Word Count文本分析工具对每条分类推文中的语言特征进行计数。我们的回归结果表明,反疫苗推文被传播(即转发),而支持疫苗的推文获得被动认可(即被点赞)。我们的结果还得出了两组语言特征作为疫苗相关推文传播的促进者和抑制剂。最后,我们的回归结果表明,反疫苗推文倾向于使用促进因子,而支持疫苗的推文则倾向于使用抑制剂。本研究的这些发现和算法将有助于公共卫生官员努力消除疫苗错误信息,从而促进在大流行和流行病期间提供预防措施。