{"title":"基于随机森林的网络空间用户博客写作倾向集成分类","authors":"N. Samsudin, A. Mustapha, M. Wahab","doi":"10.1109/INNOVATIONS.2016.7880046","DOIUrl":null,"url":null,"abstract":"As blogs widely spread, the need to extract information is necessary in order to deal with different issues such as social, political, criminal and others. This research takes off from Gharehchopogh et al. [2], [3] who used the C4.5 and K-Nearest Neighbor (K-NN) algorithms to classify bloggers whether they are professional or otherwise from the Kohkilooyeh and Boyer Ahmad province in Iran. As a comparative measure, this paper proposed the Random Forest algorithm to perform the blog classification using the same dataset. The results showed that ensemble classification via Random Forest algorithm is able to produce higher precision of 88% as compared to 82% by the C4.5 algorithm and 84% by K-NN in the previous research.","PeriodicalId":412653,"journal":{"name":"2016 12th International Conference on Innovations in Information Technology (IIT)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Ensemble classification of cyber space users tendency in blog writing using random forest\",\"authors\":\"N. Samsudin, A. Mustapha, M. Wahab\",\"doi\":\"10.1109/INNOVATIONS.2016.7880046\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As blogs widely spread, the need to extract information is necessary in order to deal with different issues such as social, political, criminal and others. This research takes off from Gharehchopogh et al. [2], [3] who used the C4.5 and K-Nearest Neighbor (K-NN) algorithms to classify bloggers whether they are professional or otherwise from the Kohkilooyeh and Boyer Ahmad province in Iran. As a comparative measure, this paper proposed the Random Forest algorithm to perform the blog classification using the same dataset. The results showed that ensemble classification via Random Forest algorithm is able to produce higher precision of 88% as compared to 82% by the C4.5 algorithm and 84% by K-NN in the previous research.\",\"PeriodicalId\":412653,\"journal\":{\"name\":\"2016 12th International Conference on Innovations in Information Technology (IIT)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 12th International Conference on Innovations in Information Technology (IIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INNOVATIONS.2016.7880046\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 12th International Conference on Innovations in Information Technology (IIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INNOVATIONS.2016.7880046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Ensemble classification of cyber space users tendency in blog writing using random forest
As blogs widely spread, the need to extract information is necessary in order to deal with different issues such as social, political, criminal and others. This research takes off from Gharehchopogh et al. [2], [3] who used the C4.5 and K-Nearest Neighbor (K-NN) algorithms to classify bloggers whether they are professional or otherwise from the Kohkilooyeh and Boyer Ahmad province in Iran. As a comparative measure, this paper proposed the Random Forest algorithm to perform the blog classification using the same dataset. The results showed that ensemble classification via Random Forest algorithm is able to produce higher precision of 88% as compared to 82% by the C4.5 algorithm and 84% by K-NN in the previous research.