{"title":"A Discretization Algorithm of Continuous Attributes Based on Supervised Clustering","authors":"Haiyang Hua, Huaici Zhao","doi":"10.1109/CCPR.2009.5344142","DOIUrl":null,"url":null,"abstract":"Many machine learning algorithms can be applied only to data described by categorical attributes. So discretizatioti of continuous attributes is one of the important steps in preprocessing of extracting knowledge. Traditional discretization algorithms based on clustering need a pre-determined clustering number k, also typically are applied in an unsupervised learning framework. This paper describes such an algorithm, called SX-means (Supervised X-means), which is a new algorithm of supervised discretization of continuous attributes on clustering. The algorithm modifies clusters with knowledge of the class distribution dynamically. And this procedure can not stop until the proper k is found. For the number of clusters k is not pre-determined by the user and class distribution is applied, the random of result is decreased greatly. Experimental evaluation of several discretization algorithms on six artificial data sets show that the proposed algorithm is more efficient and can generate a better discretization schema. Comparing the output of C4.5, resulting tree is smaller, less classification rules, and high accuracy of classification.","PeriodicalId":354468,"journal":{"name":"2009 Chinese Conference on Pattern Recognition","volume":"92 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Chinese Conference on Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCPR.2009.5344142","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Many machine learning algorithms can be applied only to data described by categorical attributes. So discretizatioti of continuous attributes is one of the important steps in preprocessing of extracting knowledge. Traditional discretization algorithms based on clustering need a pre-determined clustering number k, also typically are applied in an unsupervised learning framework. This paper describes such an algorithm, called SX-means (Supervised X-means), which is a new algorithm of supervised discretization of continuous attributes on clustering. The algorithm modifies clusters with knowledge of the class distribution dynamically. And this procedure can not stop until the proper k is found. For the number of clusters k is not pre-determined by the user and class distribution is applied, the random of result is decreased greatly. Experimental evaluation of several discretization algorithms on six artificial data sets show that the proposed algorithm is more efficient and can generate a better discretization schema. Comparing the output of C4.5, resulting tree is smaller, less classification rules, and high accuracy of classification.