{"title":"Identifying Twitter Spam by Utilizing Random Forests","authors":"Humza Haider","doi":"10.61366/2576-2176.1046","DOIUrl":null,"url":null,"abstract":"The use of Twitter has rapidly grown since the first tweet in 2006. The number of spammers on Twitter shows a similar increase. Classifying users into spammers and nonspammers has been heavily researched, and new methods for spam detection are developing rapidly. One of these classification techniques is known as random forests. We examine three studies that employ random forests using user based features, geo-tagged features, and time dependent features. Each study showed high accuracy rates and F-measures with the exception of one model that had a test set with a more realistic proportion of spam relative to typical testing procedures. These studies suggest that random forests, in combination with unique feature selection can be used to identify spam and spammers with high accuracy but may have shortcomings when applied to real world situations.","PeriodicalId":113813,"journal":{"name":"Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.61366/2576-2176.1046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The use of Twitter has rapidly grown since the first tweet in 2006. The number of spammers on Twitter shows a similar increase. Classifying users into spammers and nonspammers has been heavily researched, and new methods for spam detection are developing rapidly. One of these classification techniques is known as random forests. We examine three studies that employ random forests using user based features, geo-tagged features, and time dependent features. Each study showed high accuracy rates and F-measures with the exception of one model that had a test set with a more realistic proportion of spam relative to typical testing procedures. These studies suggest that random forests, in combination with unique feature selection can be used to identify spam and spammers with high accuracy but may have shortcomings when applied to real world situations.