从Twitter聊天中早期检测欺诈性COVID-19产品:使用异常检测的数据集和基线方法

IF 2.3 Q1 HEALTH CARE SCIENCES & SERVICES JMIR infodemiology Pub Date : 2023-01-01 DOI:10.2196/43694

Abeed Sarker, Sahithi Lakamana, Ruqi Liao, Aamir Abbas, Yuan-Chi Yang, Mohammed Al-Garadi

{"title":"从Twitter聊天中早期检测欺诈性COVID-19产品:使用异常检测的数据集和基线方法","authors":"Abeed Sarker, Sahithi Lakamana, Ruqi Liao, Aamir Abbas, Yuan-Chi Yang, Mohammed Al-Garadi","doi":"10.2196/43694","DOIUrl":null,"url":null,"abstract":"Background: Social media has served as a lucrative platform for spreading misinformation and for promoting fraudulent products for the treatment, testing, and prevention of COVID-19. This has resulted in the issuance of many warning letters by the US Food and Drug Administration (FDA). While social media continues to serve as the primary platform for the promotion of such fraudulent products, it also presents the opportunity to identify these products early by using effective social media mining methods.Objective: Our objectives were to (1) create a data set of fraudulent COVID-19 products that can be used for future research and (2) propose a method using data from Twitter for automatically detecting heavily promoted COVID-19 products early.Methods: We created a data set from FDA-issued warnings during the early months of the COVID-19 pandemic. We used natural language processing and time-series anomaly detection methods for automatically detecting fraudulent COVID-19 products early from Twitter. Our approach is based on the intuition that increases in the popularity of fraudulent products lead to corresponding anomalous increases in the volume of chatter regarding them. We compared the anomaly signal generation date for each product with the corresponding FDA letter issuance date. We also performed a brief manual analysis of chatter associated with 2 products to characterize their contents.Results: FDA warning issue dates ranged from March 6, 2020, to June 22, 2021, and 44 key phrases representing fraudulent products were included. From 577,872,350 posts made between February 19 and December 31, 2020, which are all publicly available, our unsupervised approach detected 34 out of 44 (77.3%) signals about fraudulent products earlier than the FDA letter issuance dates, and an additional 6 (13.6%) within a week following the corresponding FDA letters. Content analysis revealed misinformation, information, political, and conspiracy theories to be prominent topics.Conclusions: Our proposed method is simple, effective, easy to deploy, and does not require high-performance computing machinery unlike deep neural network-based methods. The method can be easily extended to other types of signal detection from social media data. The data set may be used for future research and the development of more advanced methods.","PeriodicalId":73554,"journal":{"name":"JMIR infodemiology","volume":"3 ","pages":"e43694"},"PeriodicalIF":2.3000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10131818/pdf/","citationCount":"0","resultStr":"{\"title\":\"The Early Detection of Fraudulent COVID-19 Products From Twitter Chatter: Data Set and Baseline Approach Using Anomaly Detection.\",\"authors\":\"Abeed Sarker, Sahithi Lakamana, Ruqi Liao, Aamir Abbas, Yuan-Chi Yang, Mohammed Al-Garadi\",\"doi\":\"10.2196/43694\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Social media has served as a lucrative platform for spreading misinformation and for promoting fraudulent products for the treatment, testing, and prevention of COVID-19. This has resulted in the issuance of many warning letters by the US Food and Drug Administration (FDA). While social media continues to serve as the primary platform for the promotion of such fraudulent products, it also presents the opportunity to identify these products early by using effective social media mining methods.Objective: Our objectives were to (1) create a data set of fraudulent COVID-19 products that can be used for future research and (2) propose a method using data from Twitter for automatically detecting heavily promoted COVID-19 products early.Methods: We created a data set from FDA-issued warnings during the early months of the COVID-19 pandemic. We used natural language processing and time-series anomaly detection methods for automatically detecting fraudulent COVID-19 products early from Twitter. Our approach is based on the intuition that increases in the popularity of fraudulent products lead to corresponding anomalous increases in the volume of chatter regarding them. We compared the anomaly signal generation date for each product with the corresponding FDA letter issuance date. We also performed a brief manual analysis of chatter associated with 2 products to characterize their contents.Results: FDA warning issue dates ranged from March 6, 2020, to June 22, 2021, and 44 key phrases representing fraudulent products were included. From 577,872,350 posts made between February 19 and December 31, 2020, which are all publicly available, our unsupervised approach detected 34 out of 44 (77.3%) signals about fraudulent products earlier than the FDA letter issuance dates, and an additional 6 (13.6%) within a week following the corresponding FDA letters. Content analysis revealed misinformation, information, political, and conspiracy theories to be prominent topics.Conclusions: Our proposed method is simple, effective, easy to deploy, and does not require high-performance computing machinery unlike deep neural network-based methods. The method can be easily extended to other types of signal detection from social media data. The data set may be used for future research and the development of more advanced methods.\",\"PeriodicalId\":73554,\"journal\":{\"name\":\"JMIR infodemiology\",\"volume\":\"3 \",\"pages\":\"e43694\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10131818/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR infodemiology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2196/43694\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR infodemiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/43694","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

摘要

背景:社交媒体已经成为传播错误信息和推广用于治疗、检测和预防COVID-19的欺诈性产品的有利可图的平台。这导致美国食品和药物管理局(FDA)发出了许多警告信。虽然社交媒体仍然是推广此类欺诈性产品的主要平台，但它也提供了通过有效的社交媒体挖掘方法及早识别这些产品的机会。目的:我们的目标是(1)创建一个欺诈性COVID-19产品的数据集，可用于未来的研究;(2)提出一种使用Twitter数据的方法，用于早期自动检测大力推广的COVID-19产品。方法:我们创建了一个数据集，该数据集来自fda在COVID-19大流行的最初几个月发布的警告。我们使用自然语言处理和时间序列异常检测方法来自动检测Twitter上的虚假COVID-19产品。我们的方法是基于这样一种直觉，即欺诈性产品的普及程度的增加会导致与之相关的聊天量的相应异常增加。我们将每种产品的异常信号产生日期与相应的FDA信函发布日期进行了比较。我们还对与2种产品相关的颤振进行了简短的手工分析，以表征其内容。结果:FDA警告发布日期为2020年3月6日至2021年6月22日，其中包括44个代表欺诈产品的关键短语。从2020年2月19日至12月31日期间公开发布的577,872,350个帖子中，我们的无监督方法在FDA信函发布日期之前检测到44个(77.3%)关于欺诈性产品的信号，另外6个(13.6%)在相应的FDA信函发布后一周内检测到。内容分析显示，错误信息、信息、政治和阴谋论是突出的话题。结论:我们提出的方法简单、有效、易于部署，并且不像基于深度神经网络的方法那样需要高性能的计算机器。该方法可以很容易地扩展到其他类型的社交媒体数据信号检测。该数据集可用于未来的研究和开发更先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

The Early Detection of Fraudulent COVID-19 Products From Twitter Chatter: Data Set and Baseline Approach Using Anomaly Detection.

Background: Social media has served as a lucrative platform for spreading misinformation and for promoting fraudulent products for the treatment, testing, and prevention of COVID-19. This has resulted in the issuance of many warning letters by the US Food and Drug Administration (FDA). While social media continues to serve as the primary platform for the promotion of such fraudulent products, it also presents the opportunity to identify these products early by using effective social media mining methods.

Objective: Our objectives were to (1) create a data set of fraudulent COVID-19 products that can be used for future research and (2) propose a method using data from Twitter for automatically detecting heavily promoted COVID-19 products early.

Methods: We created a data set from FDA-issued warnings during the early months of the COVID-19 pandemic. We used natural language processing and time-series anomaly detection methods for automatically detecting fraudulent COVID-19 products early from Twitter. Our approach is based on the intuition that increases in the popularity of fraudulent products lead to corresponding anomalous increases in the volume of chatter regarding them. We compared the anomaly signal generation date for each product with the corresponding FDA letter issuance date. We also performed a brief manual analysis of chatter associated with 2 products to characterize their contents.

Results: FDA warning issue dates ranged from March 6, 2020, to June 22, 2021, and 44 key phrases representing fraudulent products were included. From 577,872,350 posts made between February 19 and December 31, 2020, which are all publicly available, our unsupervised approach detected 34 out of 44 (77.3%) signals about fraudulent products earlier than the FDA letter issuance dates, and an additional 6 (13.6%) within a week following the corresponding FDA letters. Content analysis revealed misinformation, information, political, and conspiracy theories to be prominent topics.

Conclusions: Our proposed method is simple, effective, easy to deploy, and does not require high-performance computing machinery unlike deep neural network-based methods. The method can be easily extended to other types of signal detection from social media data. The data set may be used for future research and the development of more advanced methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

JMIR infodemiology

CiteScore

4.80

自引率

0.00%

发文量