R. Sear, R. Leahy, N. J. Restrepo, Y. Lupu, N. Johnson
{"title":"机器学习语言模型:社交媒体平台的阿喀琉斯之踵和可能的解决方案","authors":"R. Sear, R. Leahy, N. J. Restrepo, Y. Lupu, N. Johnson","doi":"10.54364/aaiml.2021.1112","DOIUrl":null,"url":null,"abstract":"Any uptick in new misinformation that casts doubt on COVID-19 mitigation strategies, such as vaccine boosters and masks, could reverse society’s recovery from the pandemic both nationally and globally. This study demonstrates howmachine learning language models can automatically generate new COVID-19 and vaccine misinformation that appears fresh and realistic (i.e. human-generated) even to subject matter experts. The study uses the latest version of theGPTmodel that is public and freely available, GPT-2, and inputs publicly available text collected from social media communities that are known for their high levels of health misinformation. The same team of subject matter experts that classified the original social media data used as input, are then asked to categorize the GPT-2 output without knowing about its automated origin. None of them successfully identified all the synthetic text strings as being a product of the machine model. This presents a clear warning for social media platforms: an unlimited volume of fresh and seemingly human-produced misinformation can be created perpetually on social media using current, off-the-shelf machine learning algorithms that run continually. We then offer a solution: a statistical approach that detects differences in the dynamics of this output as compared to typical human behavior.","PeriodicalId":373878,"journal":{"name":"Adv. Artif. Intell. Mach. Learn.","volume":"86 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Machine Learning Language Models: Achilles Heel for Social Media Platforms and a Possible Solution\",\"authors\":\"R. Sear, R. Leahy, N. J. Restrepo, Y. Lupu, N. Johnson\",\"doi\":\"10.54364/aaiml.2021.1112\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Any uptick in new misinformation that casts doubt on COVID-19 mitigation strategies, such as vaccine boosters and masks, could reverse society’s recovery from the pandemic both nationally and globally. This study demonstrates howmachine learning language models can automatically generate new COVID-19 and vaccine misinformation that appears fresh and realistic (i.e. human-generated) even to subject matter experts. The study uses the latest version of theGPTmodel that is public and freely available, GPT-2, and inputs publicly available text collected from social media communities that are known for their high levels of health misinformation. The same team of subject matter experts that classified the original social media data used as input, are then asked to categorize the GPT-2 output without knowing about its automated origin. None of them successfully identified all the synthetic text strings as being a product of the machine model. This presents a clear warning for social media platforms: an unlimited volume of fresh and seemingly human-produced misinformation can be created perpetually on social media using current, off-the-shelf machine learning algorithms that run continually. We then offer a solution: a statistical approach that detects differences in the dynamics of this output as compared to typical human behavior.\",\"PeriodicalId\":373878,\"journal\":{\"name\":\"Adv. Artif. Intell. Mach. Learn.\",\"volume\":\"86 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Adv. Artif. Intell. Mach. Learn.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.54364/aaiml.2021.1112\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Adv. Artif. Intell. Mach. Learn.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54364/aaiml.2021.1112","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Machine Learning Language Models: Achilles Heel for Social Media Platforms and a Possible Solution
Any uptick in new misinformation that casts doubt on COVID-19 mitigation strategies, such as vaccine boosters and masks, could reverse society’s recovery from the pandemic both nationally and globally. This study demonstrates howmachine learning language models can automatically generate new COVID-19 and vaccine misinformation that appears fresh and realistic (i.e. human-generated) even to subject matter experts. The study uses the latest version of theGPTmodel that is public and freely available, GPT-2, and inputs publicly available text collected from social media communities that are known for their high levels of health misinformation. The same team of subject matter experts that classified the original social media data used as input, are then asked to categorize the GPT-2 output without knowing about its automated origin. None of them successfully identified all the synthetic text strings as being a product of the machine model. This presents a clear warning for social media platforms: an unlimited volume of fresh and seemingly human-produced misinformation can be created perpetually on social media using current, off-the-shelf machine learning algorithms that run continually. We then offer a solution: a statistical approach that detects differences in the dynamics of this output as compared to typical human behavior.