利用信息值对k-means进行半监督分割

ORiON Pub Date : 2017-12-08 DOI:10.5784/33-2-568

D. G. Breed, T. Verster, S. Terblanche

{"title":"利用信息值对k-means进行半监督分割","authors":"D. G. Breed, T. Verster, S. Terblanche","doi":"10.5784/33-2-568","DOIUrl":null,"url":null,"abstract":"Segmentation (or partitioning) of data for the purpose of enhancing predictive modelling is a well-established practice in the banking industry. Unsupervised and supervised approaches are the two main streams of segmentation and examples exist where the application of these techniques improved the performance of predictive models. Both these streams focus, however, on a single aspect (i.e. either target separation or independent variable distribution) and combining them may deliver better results in some instances. In this paper a semi-supervised segmentation algorithm is presented, which is based on k-means clustering and which applies information value for the purpose of informing the segmentation process. Simulated data are used to identify a few key characteristics that may cause one segmentation technique to outperform another. In the empirical study the newly proposed semi-supervised segmentation algorithm outperforms both an unsupervised and a supervised segmentation technique, when compared by using the Gini coecient as performance measure of the resulting predictive models. Key words : Banking, clustering, multivariate statistics, data mining","PeriodicalId":30587,"journal":{"name":"ORiON","volume":"3 1","pages":"85-103"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A semi-supervised segmentation algorithm as applied to k-means using information value\",\"authors\":\"D. G. Breed, T. Verster, S. Terblanche\",\"doi\":\"10.5784/33-2-568\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Segmentation (or partitioning) of data for the purpose of enhancing predictive modelling is a well-established practice in the banking industry. Unsupervised and supervised approaches are the two main streams of segmentation and examples exist where the application of these techniques improved the performance of predictive models. Both these streams focus, however, on a single aspect (i.e. either target separation or independent variable distribution) and combining them may deliver better results in some instances. In this paper a semi-supervised segmentation algorithm is presented, which is based on k-means clustering and which applies information value for the purpose of informing the segmentation process. Simulated data are used to identify a few key characteristics that may cause one segmentation technique to outperform another. In the empirical study the newly proposed semi-supervised segmentation algorithm outperforms both an unsupervised and a supervised segmentation technique, when compared by using the Gini coecient as performance measure of the resulting predictive models. Key words : Banking, clustering, multivariate statistics, data mining\",\"PeriodicalId\":30587,\"journal\":{\"name\":\"ORiON\",\"volume\":\"3 1\",\"pages\":\"85-103\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ORiON\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5784/33-2-568\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ORiON","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5784/33-2-568","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

为增强预测建模而对数据进行分割(或分区)是银行业中公认的做法。无监督和有监督方法是分割的两大主流，并且存在应用这些技术提高预测模型性能的例子。然而，这两种流程都聚焦于单个方面(即目标分离或自变量分布)，在某些情况下，将它们结合起来可能会产生更好的结果。本文提出了一种基于k均值聚类的半监督分割算法，该算法利用信息值来通知分割过程。模拟数据用于识别可能导致一种分割技术优于另一种分割技术的几个关键特征。在实证研究中，新提出的半监督分割算法在使用基尼系数作为所得预测模型的性能度量时，优于无监督和有监督分割技术。关键词:银行业，聚类，多元统计，数据挖掘

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A semi-supervised segmentation algorithm as applied to k-means using information value

Segmentation (or partitioning) of data for the purpose of enhancing predictive modelling is a well-established practice in the banking industry. Unsupervised and supervised approaches are the two main streams of segmentation and examples exist where the application of these techniques improved the performance of predictive models. Both these streams focus, however, on a single aspect (i.e. either target separation or independent variable distribution) and combining them may deliver better results in some instances. In this paper a semi-supervised segmentation algorithm is presented, which is based on k-means clustering and which applies information value for the purpose of informing the segmentation process. Simulated data are used to identify a few key characteristics that may cause one segmentation technique to outperform another. In the empirical study the newly proposed semi-supervised segmentation algorithm outperforms both an unsupervised and a supervised segmentation technique, when compared by using the Gini coecient as performance measure of the resulting predictive models. Key words : Banking, clustering, multivariate statistics, data mining

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ORiON

自引率

0.00%

发文量