机器学习语言模型:社交媒体平台的阿喀琉斯之踵和可能的解决方案

R. Sear, R. Leahy, N. J. Restrepo, Y. Lupu, N. Johnson
{"title":"机器学习语言模型:社交媒体平台的阿喀琉斯之踵和可能的解决方案","authors":"R. Sear, R. Leahy, N. J. Restrepo, Y. Lupu, N. Johnson","doi":"10.54364/aaiml.2021.1112","DOIUrl":null,"url":null,"abstract":"Any uptick in new misinformation that casts doubt on COVID-19 mitigation strategies, such as vaccine boosters and masks, could reverse society’s recovery from the pandemic both nationally and globally. This study demonstrates howmachine learning language models can automatically generate new COVID-19 and vaccine misinformation that appears fresh and realistic (i.e. human-generated) even to subject matter experts. The study uses the latest version of theGPTmodel that is public and freely available, GPT-2, and inputs publicly available text collected from social media communities that are known for their high levels of health misinformation. The same team of subject matter experts that classified the original social media data used as input, are then asked to categorize the GPT-2 output without knowing about its automated origin. None of them successfully identified all the synthetic text strings as being a product of the machine model. This presents a clear warning for social media platforms: an unlimited volume of fresh and seemingly human-produced misinformation can be created perpetually on social media using current, off-the-shelf machine learning algorithms that run continually. We then offer a solution: a statistical approach that detects differences in the dynamics of this output as compared to typical human behavior.","PeriodicalId":373878,"journal":{"name":"Adv. Artif. Intell. Mach. Learn.","volume":"86 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Machine Learning Language Models: Achilles Heel for Social Media Platforms and a Possible Solution\",\"authors\":\"R. Sear, R. Leahy, N. J. Restrepo, Y. Lupu, N. Johnson\",\"doi\":\"10.54364/aaiml.2021.1112\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Any uptick in new misinformation that casts doubt on COVID-19 mitigation strategies, such as vaccine boosters and masks, could reverse society’s recovery from the pandemic both nationally and globally. This study demonstrates howmachine learning language models can automatically generate new COVID-19 and vaccine misinformation that appears fresh and realistic (i.e. human-generated) even to subject matter experts. The study uses the latest version of theGPTmodel that is public and freely available, GPT-2, and inputs publicly available text collected from social media communities that are known for their high levels of health misinformation. The same team of subject matter experts that classified the original social media data used as input, are then asked to categorize the GPT-2 output without knowing about its automated origin. None of them successfully identified all the synthetic text strings as being a product of the machine model. This presents a clear warning for social media platforms: an unlimited volume of fresh and seemingly human-produced misinformation can be created perpetually on social media using current, off-the-shelf machine learning algorithms that run continually. We then offer a solution: a statistical approach that detects differences in the dynamics of this output as compared to typical human behavior.\",\"PeriodicalId\":373878,\"journal\":{\"name\":\"Adv. Artif. Intell. Mach. Learn.\",\"volume\":\"86 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Adv. Artif. Intell. Mach. Learn.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.54364/aaiml.2021.1112\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Adv. Artif. Intell. Mach. Learn.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54364/aaiml.2021.1112","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

任何对COVID-19缓解战略(如疫苗增强剂和口罩)产生怀疑的新错误信息的增加,都可能逆转国家和全球社会从大流行中复苏的势头。这项研究展示了机器学习语言模型如何自动生成新的COVID-19和疫苗错误信息,即使对主题专家来说,这些信息也显得新鲜和真实(即人为生成)。该研究使用了最新版本的公共免费gpt模型GPT-2,并输入了从社交媒体社区收集的公开文本,这些社区以其高水平的健康错误信息而闻名。将原始社交媒体数据分类作为输入的同一主题专家团队,随后被要求在不知道其自动来源的情况下对GPT-2输出进行分类。它们都没有成功地将所有合成文本字符串识别为机器模型的产物。这对社交媒体平台提出了一个明确的警告:使用当前的、现成的、持续运行的机器学习算法,可以在社交媒体上永久地创造出无限数量的新鲜的、似乎是人为制造的错误信息。然后,我们提供了一个解决方案:一种统计方法,可以检测输出动态与典型人类行为的差异。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Machine Learning Language Models: Achilles Heel for Social Media Platforms and a Possible Solution
Any uptick in new misinformation that casts doubt on COVID-19 mitigation strategies, such as vaccine boosters and masks, could reverse society’s recovery from the pandemic both nationally and globally. This study demonstrates howmachine learning language models can automatically generate new COVID-19 and vaccine misinformation that appears fresh and realistic (i.e. human-generated) even to subject matter experts. The study uses the latest version of theGPTmodel that is public and freely available, GPT-2, and inputs publicly available text collected from social media communities that are known for their high levels of health misinformation. The same team of subject matter experts that classified the original social media data used as input, are then asked to categorize the GPT-2 output without knowing about its automated origin. None of them successfully identified all the synthetic text strings as being a product of the machine model. This presents a clear warning for social media platforms: an unlimited volume of fresh and seemingly human-produced misinformation can be created perpetually on social media using current, off-the-shelf machine learning algorithms that run continually. We then offer a solution: a statistical approach that detects differences in the dynamics of this output as compared to typical human behavior.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
FishRecGAN: An End to End GAN Based Network for Fisheye Rectification and Calibration Should ChatGPT and Bard Share Revenue with Their Data Providers? A New Business Model for the AI Era Structural Vibration Signal Denoising Using Stacking Ensemble of Hybrid CNN-RNN A Comparison of Methods for Neural Network Aggregation One-class Damage Detector Using Deeper Fully Convolutional Data Descriptions for Civil Application
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1