Lynnette Hui Xian Ng, Dawn C. Robertson, Kathleen M. Carley
{"title":"稳定监督机器人检测算法:需要多少数据才能实现一致的预测?","authors":"Lynnette Hui Xian Ng, Dawn C. Robertson, Kathleen M. Carley","doi":"10.1016/j.osnem.2022.100198","DOIUrl":null,"url":null,"abstract":"<div><p>Social media bots have been characterized in their use in digital activism and information manipulation, due to their roles in information diffusion. The detection of bots has been a major task within the field of social media computation, and many datasets and bot detection algorithms have been developed. With these algorithms, the bot score stability is key in estimating the impact of bots on the diffusion of information. Within several experiments on Twitter agents, we quantify the amount of data required for consistent bot predictions and analyze agent bot classification behavior. Through this study, we developed a methodology to establish parameters for stabilizing the bot probability score through threshold, temporal and volume analysis, eventually quantifying suitable threshold values for bot classification (i.e. whether the agent is a bot or not) and reasonable data collection size (i.e. number of days of tweets or number of tweets) for stable scores and bot classification.</p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2468696422000027/pdfft?md5=879d4a241d8634d464a12524eaf23546&pid=1-s2.0-S2468696422000027-main.pdf","citationCount":"17","resultStr":"{\"title\":\"Stabilizing a supervised bot detection algorithm: How much data is needed for consistent predictions?\",\"authors\":\"Lynnette Hui Xian Ng, Dawn C. Robertson, Kathleen M. Carley\",\"doi\":\"10.1016/j.osnem.2022.100198\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Social media bots have been characterized in their use in digital activism and information manipulation, due to their roles in information diffusion. The detection of bots has been a major task within the field of social media computation, and many datasets and bot detection algorithms have been developed. With these algorithms, the bot score stability is key in estimating the impact of bots on the diffusion of information. Within several experiments on Twitter agents, we quantify the amount of data required for consistent bot predictions and analyze agent bot classification behavior. Through this study, we developed a methodology to establish parameters for stabilizing the bot probability score through threshold, temporal and volume analysis, eventually quantifying suitable threshold values for bot classification (i.e. whether the agent is a bot or not) and reasonable data collection size (i.e. number of days of tweets or number of tweets) for stable scores and bot classification.</p></div>\",\"PeriodicalId\":52228,\"journal\":{\"name\":\"Online Social Networks and Media\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2468696422000027/pdfft?md5=879d4a241d8634d464a12524eaf23546&pid=1-s2.0-S2468696422000027-main.pdf\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Online Social Networks and Media\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2468696422000027\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Online Social Networks and Media","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468696422000027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Social Sciences","Score":null,"Total":0}
Stabilizing a supervised bot detection algorithm: How much data is needed for consistent predictions?
Social media bots have been characterized in their use in digital activism and information manipulation, due to their roles in information diffusion. The detection of bots has been a major task within the field of social media computation, and many datasets and bot detection algorithms have been developed. With these algorithms, the bot score stability is key in estimating the impact of bots on the diffusion of information. Within several experiments on Twitter agents, we quantify the amount of data required for consistent bot predictions and analyze agent bot classification behavior. Through this study, we developed a methodology to establish parameters for stabilizing the bot probability score through threshold, temporal and volume analysis, eventually quantifying suitable threshold values for bot classification (i.e. whether the agent is a bot or not) and reasonable data collection size (i.e. number of days of tweets or number of tweets) for stable scores and bot classification.