{"title":"Examining distributional characteristics of clusters.","authors":"A von Eye","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Standard cluster analysis creates clusters based on the criterion that their members be closer to each other than to members of other clusters. In this article, it is proposed to examine empirical clusters that result from standard clustering, with the goal of assessing whether they contradict distributional assumptions. Four models are proposed. The models consider two data generation processes, the Poisson and the multinormal, as well as two convex shapes of cluster hulls, the spherical and the ellipsoidal. Based on the model, the probability of being in a cluster of a given location, size, and shape is estimated. This probability is compared with the observed proportion of cases. The observed proportion can turn out to be larger, as large, or smaller than expected. Examples are given using simulated and empirical data. The simulation showed that the size of a cluster, the data generation process, and the true distribution of data have the strongest effect on the results obtained with the proposed method. The empirical examples discuss distributional characteristics of cross-sectional and longitudinal clusters of aggressive behavior in adolescents. The examples show that clustering methods do not always yield clusters that contradict distributional assumptions. Some clusters contain even fewer cases than expected.</p>","PeriodicalId":72476,"journal":{"name":"Bulletin de la Societe des sciences medicales du Grand-Duche de Luxembourg","volume":"Spec No 1 1","pages":"14-39"},"PeriodicalIF":0.0000,"publicationDate":"2010-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bulletin de la Societe des sciences medicales du Grand-Duche de Luxembourg","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Standard cluster analysis creates clusters based on the criterion that their members be closer to each other than to members of other clusters. In this article, it is proposed to examine empirical clusters that result from standard clustering, with the goal of assessing whether they contradict distributional assumptions. Four models are proposed. The models consider two data generation processes, the Poisson and the multinormal, as well as two convex shapes of cluster hulls, the spherical and the ellipsoidal. Based on the model, the probability of being in a cluster of a given location, size, and shape is estimated. This probability is compared with the observed proportion of cases. The observed proportion can turn out to be larger, as large, or smaller than expected. Examples are given using simulated and empirical data. The simulation showed that the size of a cluster, the data generation process, and the true distribution of data have the strongest effect on the results obtained with the proposed method. The empirical examples discuss distributional characteristics of cross-sectional and longitudinal clusters of aggressive behavior in adolescents. The examples show that clustering methods do not always yield clusters that contradict distributional assumptions. Some clusters contain even fewer cases than expected.