面向大数据的成本敏感分类器

A. Haldankar, Kiran Bhowmick
{"title":"面向大数据的成本敏感分类器","authors":"A. Haldankar, Kiran Bhowmick","doi":"10.1109/ICAECCT.2016.7942567","DOIUrl":null,"url":null,"abstract":"Data Mining techniques have been used to detect fraud related to several domains like risk identification. An assumption about the data is that it is always balanced, this is far from true. It doesn't represent the reality. In this paper we develop a cost sensitive classifier to detect Risk using the Statlog (German Credit Data) data set. This study shows how application of proper feature selection followed by using a unique combination of ensemble & thresholding helps to reduce the overall cost. We also see the effects of this classifier on unstructured data as well as streaming data.","PeriodicalId":6629,"journal":{"name":"2016 IEEE International Conference on Advances in Electronics, Communication and Computer Technology (ICAECCT)","volume":"143 1","pages":"122-127"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A cost sensitive classifier for Big Data\",\"authors\":\"A. Haldankar, Kiran Bhowmick\",\"doi\":\"10.1109/ICAECCT.2016.7942567\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data Mining techniques have been used to detect fraud related to several domains like risk identification. An assumption about the data is that it is always balanced, this is far from true. It doesn't represent the reality. In this paper we develop a cost sensitive classifier to detect Risk using the Statlog (German Credit Data) data set. This study shows how application of proper feature selection followed by using a unique combination of ensemble & thresholding helps to reduce the overall cost. We also see the effects of this classifier on unstructured data as well as streaming data.\",\"PeriodicalId\":6629,\"journal\":{\"name\":\"2016 IEEE International Conference on Advances in Electronics, Communication and Computer Technology (ICAECCT)\",\"volume\":\"143 1\",\"pages\":\"122-127\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE International Conference on Advances in Electronics, Communication and Computer Technology (ICAECCT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAECCT.2016.7942567\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Advances in Electronics, Communication and Computer Technology (ICAECCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAECCT.2016.7942567","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

数据挖掘技术已被用于检测与风险识别等多个领域相关的欺诈行为。关于数据的一个假设是,它总是平衡的,这远非事实。它不代表现实。在本文中,我们开发了一个成本敏感分类器来检测风险使用Statlog(德国信用数据)数据集。本研究展示了如何应用适当的特征选择,然后使用集成和阈值的独特组合有助于降低总体成本。我们还看到了这个分类器对非结构化数据和流数据的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A cost sensitive classifier for Big Data
Data Mining techniques have been used to detect fraud related to several domains like risk identification. An assumption about the data is that it is always balanced, this is far from true. It doesn't represent the reality. In this paper we develop a cost sensitive classifier to detect Risk using the Statlog (German Credit Data) data set. This study shows how application of proper feature selection followed by using a unique combination of ensemble & thresholding helps to reduce the overall cost. We also see the effects of this classifier on unstructured data as well as streaming data.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Keynote speakers Emotweet: Sentiment Analysis tool for twitter Design of faster & power efficient sense amplifier using VLSI technology A comparative study on distance measuring approches for permutation representations An embedded system of dedicated and real-time fire detector and locator technology as an interactive response mechanism in fire occurrences
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1