利用信息值对k-means进行半监督分割

ORiON Pub Date : 2017-12-08 DOI:10.5784/33-2-568
D. G. Breed, T. Verster, S. Terblanche
{"title":"利用信息值对k-means进行半监督分割","authors":"D. G. Breed, T. Verster, S. Terblanche","doi":"10.5784/33-2-568","DOIUrl":null,"url":null,"abstract":"Segmentation (or partitioning) of data for the purpose of enhancing predictive modelling is a well-established practice in the banking industry. Unsupervised and supervised approaches are the two main streams of segmentation and examples exist where the application of these techniques improved the performance of predictive models. Both these streams focus, however, on a single aspect (i.e. either target separation or independent variable distribution) and combining them may deliver better results in some instances. In this paper a semi-supervised segmentation algorithm is presented, which is based on k-means clustering and which applies information value for the purpose of informing the segmentation process. Simulated data are used to identify a few key characteristics that may cause one segmentation technique to outperform another. In the empirical study the newly proposed semi-supervised segmentation algorithm outperforms both an unsupervised and a supervised segmentation technique, when compared by using the Gini coecient as performance measure of the resulting predictive models. Key words : Banking, clustering, multivariate statistics, data mining","PeriodicalId":30587,"journal":{"name":"ORiON","volume":"3 1","pages":"85-103"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A semi-supervised segmentation algorithm as applied to k-means using information value\",\"authors\":\"D. G. Breed, T. Verster, S. Terblanche\",\"doi\":\"10.5784/33-2-568\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Segmentation (or partitioning) of data for the purpose of enhancing predictive modelling is a well-established practice in the banking industry. Unsupervised and supervised approaches are the two main streams of segmentation and examples exist where the application of these techniques improved the performance of predictive models. Both these streams focus, however, on a single aspect (i.e. either target separation or independent variable distribution) and combining them may deliver better results in some instances. In this paper a semi-supervised segmentation algorithm is presented, which is based on k-means clustering and which applies information value for the purpose of informing the segmentation process. Simulated data are used to identify a few key characteristics that may cause one segmentation technique to outperform another. In the empirical study the newly proposed semi-supervised segmentation algorithm outperforms both an unsupervised and a supervised segmentation technique, when compared by using the Gini coecient as performance measure of the resulting predictive models. Key words : Banking, clustering, multivariate statistics, data mining\",\"PeriodicalId\":30587,\"journal\":{\"name\":\"ORiON\",\"volume\":\"3 1\",\"pages\":\"85-103\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ORiON\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5784/33-2-568\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ORiON","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5784/33-2-568","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

为增强预测建模而对数据进行分割(或分区)是银行业中公认的做法。无监督和有监督方法是分割的两大主流,并且存在应用这些技术提高预测模型性能的例子。然而,这两种流程都聚焦于单个方面(即目标分离或自变量分布),在某些情况下,将它们结合起来可能会产生更好的结果。本文提出了一种基于k均值聚类的半监督分割算法,该算法利用信息值来通知分割过程。模拟数据用于识别可能导致一种分割技术优于另一种分割技术的几个关键特征。在实证研究中,新提出的半监督分割算法在使用基尼系数作为所得预测模型的性能度量时,优于无监督和有监督分割技术。关键词:银行业,聚类,多元统计,数据挖掘
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A semi-supervised segmentation algorithm as applied to k-means using information value
Segmentation (or partitioning) of data for the purpose of enhancing predictive modelling is a well-established practice in the banking industry. Unsupervised and supervised approaches are the two main streams of segmentation and examples exist where the application of these techniques improved the performance of predictive models. Both these streams focus, however, on a single aspect (i.e. either target separation or independent variable distribution) and combining them may deliver better results in some instances. In this paper a semi-supervised segmentation algorithm is presented, which is based on k-means clustering and which applies information value for the purpose of informing the segmentation process. Simulated data are used to identify a few key characteristics that may cause one segmentation technique to outperform another. In the empirical study the newly proposed semi-supervised segmentation algorithm outperforms both an unsupervised and a supervised segmentation technique, when compared by using the Gini coecient as performance measure of the resulting predictive models. Key words : Banking, clustering, multivariate statistics, data mining
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
11
期刊最新文献
Route overlap metrics to batch orders Anomaly detection using autoencoders with network analysis features On the calibration of stochastic volatility models to estimate the real-world measure used in option pricing Celebrating 50-years of OR in South Africa – a Bibliometric Analysis of contributions to International OR Literature Comments: Development of an early career academic supervisor in Statistics - a discussion on a guiding rubric
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1