{"title":"基于加权自信息相关数据变换的欧氏距离标称数据聚类","authors":"Lei Gu, Liying Zhang, Yang Zhao","doi":"10.1145/3132847.3133062","DOIUrl":null,"url":null,"abstract":"Numerical data clustering is a tractable task since well-defined numerical measures like traditional Euclidean distance can be directly used for it, but nominal data clustering is a very difficult problem because there exists no natural relative ordering between nominal attribute values. This paper mainly aims to make the Euclidean distance measure appropriate to nominal data clustering, and the core idea is to transform each nominal attribute value into numerical. This transformation method consists of three steps. In the first step, the weighted self-information, which can quantify the amount of information in attribute values, is calculated for each value in each nominal attribute. In the second step, we find k nearest neighbors for each object because k nearest neighbors of one object have close similarities with it. In the last step, the weighted self-information of each attribute value in each nominal object is modified according to the object's k nearest neighbors. To evaluate the effectiveness of our proposed method, experiments are done on 10 data sets. Experimental results demonstrate that our method not only enables the Euclidean distance to be used for nominal data clustering, but also can acquire the better clustering performance than several existing state-of-the-art approaches.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"11 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"An Euclidean Distance based on the Weighted Self-information Related Data Transformation for Nominal Data Clustering\",\"authors\":\"Lei Gu, Liying Zhang, Yang Zhao\",\"doi\":\"10.1145/3132847.3133062\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Numerical data clustering is a tractable task since well-defined numerical measures like traditional Euclidean distance can be directly used for it, but nominal data clustering is a very difficult problem because there exists no natural relative ordering between nominal attribute values. This paper mainly aims to make the Euclidean distance measure appropriate to nominal data clustering, and the core idea is to transform each nominal attribute value into numerical. This transformation method consists of three steps. In the first step, the weighted self-information, which can quantify the amount of information in attribute values, is calculated for each value in each nominal attribute. In the second step, we find k nearest neighbors for each object because k nearest neighbors of one object have close similarities with it. In the last step, the weighted self-information of each attribute value in each nominal object is modified according to the object's k nearest neighbors. To evaluate the effectiveness of our proposed method, experiments are done on 10 data sets. Experimental results demonstrate that our method not only enables the Euclidean distance to be used for nominal data clustering, but also can acquire the better clustering performance than several existing state-of-the-art approaches.\",\"PeriodicalId\":20449,\"journal\":{\"name\":\"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management\",\"volume\":\"11 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3132847.3133062\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3132847.3133062","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Euclidean Distance based on the Weighted Self-information Related Data Transformation for Nominal Data Clustering
Numerical data clustering is a tractable task since well-defined numerical measures like traditional Euclidean distance can be directly used for it, but nominal data clustering is a very difficult problem because there exists no natural relative ordering between nominal attribute values. This paper mainly aims to make the Euclidean distance measure appropriate to nominal data clustering, and the core idea is to transform each nominal attribute value into numerical. This transformation method consists of three steps. In the first step, the weighted self-information, which can quantify the amount of information in attribute values, is calculated for each value in each nominal attribute. In the second step, we find k nearest neighbors for each object because k nearest neighbors of one object have close similarities with it. In the last step, the weighted self-information of each attribute value in each nominal object is modified according to the object's k nearest neighbors. To evaluate the effectiveness of our proposed method, experiments are done on 10 data sets. Experimental results demonstrate that our method not only enables the Euclidean distance to be used for nominal data clustering, but also can acquire the better clustering performance than several existing state-of-the-art approaches.