{"title":"A cost sensitive classifier for Big Data","authors":"A. Haldankar, Kiran Bhowmick","doi":"10.1109/ICAECCT.2016.7942567","DOIUrl":null,"url":null,"abstract":"Data Mining techniques have been used to detect fraud related to several domains like risk identification. An assumption about the data is that it is always balanced, this is far from true. It doesn't represent the reality. In this paper we develop a cost sensitive classifier to detect Risk using the Statlog (German Credit Data) data set. This study shows how application of proper feature selection followed by using a unique combination of ensemble & thresholding helps to reduce the overall cost. We also see the effects of this classifier on unstructured data as well as streaming data.","PeriodicalId":6629,"journal":{"name":"2016 IEEE International Conference on Advances in Electronics, Communication and Computer Technology (ICAECCT)","volume":"143 1","pages":"122-127"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Advances in Electronics, Communication and Computer Technology (ICAECCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAECCT.2016.7942567","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Data Mining techniques have been used to detect fraud related to several domains like risk identification. An assumption about the data is that it is always balanced, this is far from true. It doesn't represent the reality. In this paper we develop a cost sensitive classifier to detect Risk using the Statlog (German Credit Data) data set. This study shows how application of proper feature selection followed by using a unique combination of ensemble & thresholding helps to reduce the overall cost. We also see the effects of this classifier on unstructured data as well as streaming data.