{"title":"Issues of application of cluster analysis in fire statistics","authors":"B. Pranov","doi":"10.25257/tts.2021.4.94.117-124","DOIUrl":null,"url":null,"abstract":"Introduction. The task of describing and modeling a set of objects in many cases is impractical to carry out by constructing models for the set as a whole. For a more accurate description, it is necessary to divide the initial set of objects into groups, within which there will be similar objects. This initial stage is carried out using cluster analysis methods. Further, you can conduct a more detailed analysis within each cluster. Goals and objectives. The number of clusters into which we want to divide the set of objects under study is usually set in advance. The purpose of the article is to use methods that allow you to optimize the number of partitioning clusters. Methods. The methods of cluster analysis were used, as well as information optimization criteria. Results and discussion. The clustering procedure for the administrative-territorial subjects of the Russian Federation is considered. Seven indicators were taken as parameters for each subject – population, number of fires, damage from fires, number of deaths in fires – in natural units, and three relative indicators. First of all, it was found that the clustering procedure had to be carried out separately for indicators in natural units and for relative indicators – the difference in the range of numbers for these groups is too great. For indicators in natural units, the clustering procedure was carried out in two ways – hierarchical clustering (SPSS program) and clustering by the k-means method in the Jupyter notebook environment using the Python programming language. Similar work was done with the three remaining relative indicators. Conclusions. Correlation analysis of the first four indicators showed that there is a multicollinearity effect – the correlation coefficient of the number of fires and the number of fire deaths exceeded the threshold value of 0,8. The parameter of the number of fires was retained for further analysis. The study showed that of the two methods used – hierarchical clustering and clustering by the k-means method in the Jupyter notebook environment – the latter method gives more meaningful results, allowing you to move on to a more detailed study of a set of objects that are similar in their parameters in each cluster. Key words: cluster analysis, parameters, optimization of the number of clusters.","PeriodicalId":356653,"journal":{"name":"Technology of technosphere safety","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Technology of technosphere safety","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.25257/tts.2021.4.94.117-124","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction. The task of describing and modeling a set of objects in many cases is impractical to carry out by constructing models for the set as a whole. For a more accurate description, it is necessary to divide the initial set of objects into groups, within which there will be similar objects. This initial stage is carried out using cluster analysis methods. Further, you can conduct a more detailed analysis within each cluster. Goals and objectives. The number of clusters into which we want to divide the set of objects under study is usually set in advance. The purpose of the article is to use methods that allow you to optimize the number of partitioning clusters. Methods. The methods of cluster analysis were used, as well as information optimization criteria. Results and discussion. The clustering procedure for the administrative-territorial subjects of the Russian Federation is considered. Seven indicators were taken as parameters for each subject – population, number of fires, damage from fires, number of deaths in fires – in natural units, and three relative indicators. First of all, it was found that the clustering procedure had to be carried out separately for indicators in natural units and for relative indicators – the difference in the range of numbers for these groups is too great. For indicators in natural units, the clustering procedure was carried out in two ways – hierarchical clustering (SPSS program) and clustering by the k-means method in the Jupyter notebook environment using the Python programming language. Similar work was done with the three remaining relative indicators. Conclusions. Correlation analysis of the first four indicators showed that there is a multicollinearity effect – the correlation coefficient of the number of fires and the number of fire deaths exceeded the threshold value of 0,8. The parameter of the number of fires was retained for further analysis. The study showed that of the two methods used – hierarchical clustering and clustering by the k-means method in the Jupyter notebook environment – the latter method gives more meaningful results, allowing you to move on to a more detailed study of a set of objects that are similar in their parameters in each cluster. Key words: cluster analysis, parameters, optimization of the number of clusters.