机器学习语言模型:社交媒体平台的阿喀琉斯之踵和可能的解决方案

Adv. Artif. Intell. Mach. Learn. Pub Date : 1900-01-01 DOI:10.54364/aaiml.2021.1112

R. Sear, R. Leahy, N. J. Restrepo, Y. Lupu, N. Johnson

{"title":"机器学习语言模型:社交媒体平台的阿喀琉斯之踵和可能的解决方案","authors":"R. Sear, R. Leahy, N. J. Restrepo, Y. Lupu, N. Johnson","doi":"10.54364/aaiml.2021.1112","DOIUrl":null,"url":null,"abstract":"Any uptick in new misinformation that casts doubt on COVID-19 mitigation strategies, such as vaccine boosters and masks, could reverse society’s recovery from the pandemic both nationally and globally. This study demonstrates howmachine learning language models can automatically generate new COVID-19 and vaccine misinformation that appears fresh and realistic (i.e. human-generated) even to subject matter experts. The study uses the latest version of theGPTmodel that is public and freely available, GPT-2, and inputs publicly available text collected from social media communities that are known for their high levels of health misinformation. The same team of subject matter experts that classified the original social media data used as input, are then asked to categorize the GPT-2 output without knowing about its automated origin. None of them successfully identified all the synthetic text strings as being a product of the machine model. This presents a clear warning for social media platforms: an unlimited volume of fresh and seemingly human-produced misinformation can be created perpetually on social media using current, off-the-shelf machine learning algorithms that run continually. We then offer a solution: a statistical approach that detects differences in the dynamics of this output as compared to typical human behavior.","PeriodicalId":373878,"journal":{"name":"Adv. Artif. Intell. Mach. Learn.","volume":"86 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Machine Learning Language Models: Achilles Heel for Social Media Platforms and a Possible Solution\",\"authors\":\"R. Sear, R. Leahy, N. J. Restrepo, Y. Lupu, N. Johnson\",\"doi\":\"10.54364/aaiml.2021.1112\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Any uptick in new misinformation that casts doubt on COVID-19 mitigation strategies, such as vaccine boosters and masks, could reverse society’s recovery from the pandemic both nationally and globally. This study demonstrates howmachine learning language models can automatically generate new COVID-19 and vaccine misinformation that appears fresh and realistic (i.e. human-generated) even to subject matter experts. The study uses the latest version of theGPTmodel that is public and freely available, GPT-2, and inputs publicly available text collected from social media communities that are known for their high levels of health misinformation. The same team of subject matter experts that classified the original social media data used as input, are then asked to categorize the GPT-2 output without knowing about its automated origin. None of them successfully identified all the synthetic text strings as being a product of the machine model. This presents a clear warning for social media platforms: an unlimited volume of fresh and seemingly human-produced misinformation can be created perpetually on social media using current, off-the-shelf machine learning algorithms that run continually. We then offer a solution: a statistical approach that detects differences in the dynamics of this output as compared to typical human behavior.\",\"PeriodicalId\":373878,\"journal\":{\"name\":\"Adv. Artif. Intell. Mach. Learn.\",\"volume\":\"86 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Adv. Artif. Intell. Mach. Learn.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.54364/aaiml.2021.1112\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Adv. Artif. Intell. Mach. Learn.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54364/aaiml.2021.1112","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

任何对COVID-19缓解战略(如疫苗增强剂和口罩)产生怀疑的新错误信息的增加，都可能逆转国家和全球社会从大流行中复苏的势头。这项研究展示了机器学习语言模型如何自动生成新的COVID-19和疫苗错误信息，即使对主题专家来说，这些信息也显得新鲜和真实(即人为生成)。该研究使用了最新版本的公共免费gpt模型GPT-2，并输入了从社交媒体社区收集的公开文本，这些社区以其高水平的健康错误信息而闻名。将原始社交媒体数据分类作为输入的同一主题专家团队，随后被要求在不知道其自动来源的情况下对GPT-2输出进行分类。它们都没有成功地将所有合成文本字符串识别为机器模型的产物。这对社交媒体平台提出了一个明确的警告:使用当前的、现成的、持续运行的机器学习算法，可以在社交媒体上永久地创造出无限数量的新鲜的、似乎是人为制造的错误信息。然后，我们提供了一个解决方案:一种统计方法，可以检测输出动态与典型人类行为的差异。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Machine Learning Language Models: Achilles Heel for Social Media Platforms and a Possible Solution

Any uptick in new misinformation that casts doubt on COVID-19 mitigation strategies, such as vaccine boosters and masks, could reverse society’s recovery from the pandemic both nationally and globally. This study demonstrates howmachine learning language models can automatically generate new COVID-19 and vaccine misinformation that appears fresh and realistic (i.e. human-generated) even to subject matter experts. The study uses the latest version of theGPTmodel that is public and freely available, GPT-2, and inputs publicly available text collected from social media communities that are known for their high levels of health misinformation. The same team of subject matter experts that classified the original social media data used as input, are then asked to categorize the GPT-2 output without knowing about its automated origin. None of them successfully identified all the synthetic text strings as being a product of the machine model. This presents a clear warning for social media platforms: an unlimited volume of fresh and seemingly human-produced misinformation can be created perpetually on social media using current, off-the-shelf machine learning algorithms that run continually. We then offer a solution: a statistical approach that detects differences in the dynamics of this output as compared to typical human behavior.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Adv. Artif. Intell. Mach. Learn.

自引率

0.00%

发文量