Issues of application of cluster analysis in fire statistics

B. Pranov
{"title":"Issues of application of cluster analysis in fire statistics","authors":"B. Pranov","doi":"10.25257/tts.2021.4.94.117-124","DOIUrl":null,"url":null,"abstract":"Introduction. The task of describing and modeling a set of objects in many cases is impractical to carry out by constructing models for the set as a whole. For a more accurate description, it is necessary to divide the initial set of objects into groups, within which there will be similar objects. This initial stage is carried out using cluster analysis methods. Further, you can conduct a more detailed analysis within each cluster. Goals and objectives. The number of clusters into which we want to divide the set of objects under study is usually set in advance. The purpose of the article is to use methods that allow you to optimize the number of partitioning clusters. Methods. The methods of cluster analysis were used, as well as information optimization criteria. Results and discussion. The clustering procedure for the administrative-territorial subjects of the Russian Federation is considered. Seven indicators were taken as parameters for each subject – population, number of fires, damage from fires, number of deaths in fires – in natural units, and three relative indicators. First of all, it was found that the clustering procedure had to be carried out separately for indicators in natural units and for relative indicators – the difference in the range of numbers for these groups is too great. For indicators in natural units, the clustering procedure was carried out in two ways – hierarchical clustering (SPSS program) and clustering by the k-means method in the Jupyter notebook environment using the Python programming language. Similar work was done with the three remaining relative indicators. Conclusions. Correlation analysis of the first four indicators showed that there is a multicollinearity effect – the correlation coefficient of the number of fires and the number of fire deaths exceeded the threshold value of 0,8. The parameter of the number of fires was retained for further analysis. The study showed that of the two methods used – hierarchical clustering and clustering by the k-means method in the Jupyter notebook environment – the latter method gives more meaningful results, allowing you to move on to a more detailed study of a set of objects that are similar in their parameters in each cluster. Key words: cluster analysis, parameters, optimization of the number of clusters.","PeriodicalId":356653,"journal":{"name":"Technology of technosphere safety","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Technology of technosphere safety","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.25257/tts.2021.4.94.117-124","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction. The task of describing and modeling a set of objects in many cases is impractical to carry out by constructing models for the set as a whole. For a more accurate description, it is necessary to divide the initial set of objects into groups, within which there will be similar objects. This initial stage is carried out using cluster analysis methods. Further, you can conduct a more detailed analysis within each cluster. Goals and objectives. The number of clusters into which we want to divide the set of objects under study is usually set in advance. The purpose of the article is to use methods that allow you to optimize the number of partitioning clusters. Methods. The methods of cluster analysis were used, as well as information optimization criteria. Results and discussion. The clustering procedure for the administrative-territorial subjects of the Russian Federation is considered. Seven indicators were taken as parameters for each subject – population, number of fires, damage from fires, number of deaths in fires – in natural units, and three relative indicators. First of all, it was found that the clustering procedure had to be carried out separately for indicators in natural units and for relative indicators – the difference in the range of numbers for these groups is too great. For indicators in natural units, the clustering procedure was carried out in two ways – hierarchical clustering (SPSS program) and clustering by the k-means method in the Jupyter notebook environment using the Python programming language. Similar work was done with the three remaining relative indicators. Conclusions. Correlation analysis of the first four indicators showed that there is a multicollinearity effect – the correlation coefficient of the number of fires and the number of fire deaths exceeded the threshold value of 0,8. The parameter of the number of fires was retained for further analysis. The study showed that of the two methods used – hierarchical clustering and clustering by the k-means method in the Jupyter notebook environment – the latter method gives more meaningful results, allowing you to move on to a more detailed study of a set of objects that are similar in their parameters in each cluster. Key words: cluster analysis, parameters, optimization of the number of clusters.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
聚类分析在火灾统计中的应用问题
介绍。在许多情况下,通过为一组对象整体构建模型来描述和建模是不切实际的。为了更准确的描述,有必要将初始的一组对象分成几组,每组中会有相似的对象。这个初始阶段是使用聚类分析方法进行的。此外,您可以在每个集群中进行更详细的分析。目标和目的。我们想要将所研究的对象集划分成的簇的数量通常是预先设定好的。本文的目的是使用一些方法来优化分区集群的数量。方法。采用聚类分析方法和信息优化准则。结果和讨论。审议了俄罗斯联邦行政领土主体的分组程序。七个指标作为每个主题的参数——人口、火灾次数、火灾损失、火灾死亡人数——以自然单位计算,以及三个相对指标。首先,我们发现,对于自然单位的指标和相对指标,必须分别进行聚类程序,因为这两组的数字范围差别太大。对于自然单位中的指标,聚类过程采用分层聚类(SPSS程序)和k-means聚类两种方式,在Jupyter笔记本环境下使用Python编程语言进行聚类。对其余三个相对指标也进行了类似的工作。结论。对前四项指标进行相关分析,发现存在多重共线性效应——火灾数量与火灾死亡人数的相关系数均超过阈值0,8。为了进一步分析,保留了火灾数量的参数。研究表明,在使用的两种方法中——在Jupyter笔记本环境中分层聚类和通过k-means方法聚类——后一种方法给出了更有意义的结果,允许您继续对一组对象进行更详细的研究,这些对象在每个聚类中的参数相似。关键词:聚类分析,参数,聚类数优化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Fire hazard of power plants of gas compressor stations Evaluation of performance of automatic emergency protection systems Study of regional characteristics of the parameters of large fires Simulation of fire dangerous failures of electrical equipment and assessment of fire and electric damage Probabilistic model of branched-chain combustion of saturated hydrocarbons in a closed volume of gas compressor stations
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1