{"title":"Research on Gas User Clustering Algorithm: Based on PCA and Attribute Weighting","authors":"Xinbo Ai, Qinfang Ji","doi":"10.1145/3548608.3559307","DOIUrl":null,"url":null,"abstract":"Gas user data has the characteristics of large amount of data and multiple attributes, while traditional user clustering algorithms usually use the distance between samples as the division standard of similarity. This distance calculation method ignores the influence of different data attributes on clustering. In order to solve this problem, this paper proposes a clustering algorithm based on PCA and attribute weighted distance (PAWDK). The method is divided into two stages: feature extraction and attribute weighted clustering. First, PCA is performed on the data to reduce redundant attributes; secondly, a method is defined. The dispersion function reflecting the difference of the attribute characteristics weights the attribute characteristics; then, the distance between the data attributes is calculated according to the weighted attribute characteristics, and the weighted attribute distance of all attributes is summed as the similarity distance between samples; finally, the weighted attribute distance is used as the division standard of kmeans clustering algorithm to cluster data. Experiments show that compared with other clustering methods, PAWDK can effectively reduce noise, achieve the goal of effectively clustering high-dimensional user data, and is closer to the characteristics of real user data set division.","PeriodicalId":201434,"journal":{"name":"Proceedings of the 2022 2nd International Conference on Control and Intelligent Robotics","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 2nd International Conference on Control and Intelligent Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3548608.3559307","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Gas user data has the characteristics of large amount of data and multiple attributes, while traditional user clustering algorithms usually use the distance between samples as the division standard of similarity. This distance calculation method ignores the influence of different data attributes on clustering. In order to solve this problem, this paper proposes a clustering algorithm based on PCA and attribute weighted distance (PAWDK). The method is divided into two stages: feature extraction and attribute weighted clustering. First, PCA is performed on the data to reduce redundant attributes; secondly, a method is defined. The dispersion function reflecting the difference of the attribute characteristics weights the attribute characteristics; then, the distance between the data attributes is calculated according to the weighted attribute characteristics, and the weighted attribute distance of all attributes is summed as the similarity distance between samples; finally, the weighted attribute distance is used as the division standard of kmeans clustering algorithm to cluster data. Experiments show that compared with other clustering methods, PAWDK can effectively reduce noise, achieve the goal of effectively clustering high-dimensional user data, and is closer to the characteristics of real user data set division.