Lynnette Hui Xian Ng, Dawn C. Robertson, Kathleen M. Carley
{"title":"Stabilizing a supervised bot detection algorithm: How much data is needed for consistent predictions?","authors":"Lynnette Hui Xian Ng, Dawn C. Robertson, Kathleen M. Carley","doi":"10.1016/j.osnem.2022.100198","DOIUrl":null,"url":null,"abstract":"<div><p>Social media bots have been characterized in their use in digital activism and information manipulation, due to their roles in information diffusion. The detection of bots has been a major task within the field of social media computation, and many datasets and bot detection algorithms have been developed. With these algorithms, the bot score stability is key in estimating the impact of bots on the diffusion of information. Within several experiments on Twitter agents, we quantify the amount of data required for consistent bot predictions and analyze agent bot classification behavior. Through this study, we developed a methodology to establish parameters for stabilizing the bot probability score through threshold, temporal and volume analysis, eventually quantifying suitable threshold values for bot classification (i.e. whether the agent is a bot or not) and reasonable data collection size (i.e. number of days of tweets or number of tweets) for stable scores and bot classification.</p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2468696422000027/pdfft?md5=879d4a241d8634d464a12524eaf23546&pid=1-s2.0-S2468696422000027-main.pdf","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Online Social Networks and Media","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468696422000027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 17
Abstract
Social media bots have been characterized in their use in digital activism and information manipulation, due to their roles in information diffusion. The detection of bots has been a major task within the field of social media computation, and many datasets and bot detection algorithms have been developed. With these algorithms, the bot score stability is key in estimating the impact of bots on the diffusion of information. Within several experiments on Twitter agents, we quantify the amount of data required for consistent bot predictions and analyze agent bot classification behavior. Through this study, we developed a methodology to establish parameters for stabilizing the bot probability score through threshold, temporal and volume analysis, eventually quantifying suitable threshold values for bot classification (i.e. whether the agent is a bot or not) and reasonable data collection size (i.e. number of days of tweets or number of tweets) for stable scores and bot classification.