Using machine learning algorithms to identify predictors of social vulnerability in the event of a hazard: Istanbul case study

IF 4.7 2区地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY Natural Hazards and Earth System Sciences Pub Date : 2023-06-15 DOI:10.5194/nhess-23-2133-2023

Oya Kalaycıoğlu, Serhat Emre Akhanli, E. Menteşe, M. Kalaycıoğlu, S. Kalaycioglu

{"title":"Using machine learning algorithms to identify predictors of social vulnerability in the event of a hazard: Istanbul case study","authors":"Oya Kalaycıoğlu, Serhat Emre Akhanli, E. Menteşe, M. Kalaycıoğlu, S. Kalaycioglu","doi":"10.5194/nhess-23-2133-2023","DOIUrl":null,"url":null,"abstract":"Abstract. To what extent an individual or group will be affected by the damage of a hazard depends not just on their exposure to the event but on their social vulnerability – that is, how well they are able to anticipate, cope with, resist, and recover from the impact of a hazard. Therefore, for mitigating disaster risk effectively and building a disaster-resilient society to natural hazards, it is essential that policy makers develop an understanding of social vulnerability. This study aims to propose an optimal predictive model that allows decision makers to identify households with high social vulnerability by using a number of easily accessible household variables. In order to develop such a model, we rely on a large dataset comprising a household survey (n = 41 093) that was conducted to generate a social vulnerability index (SoVI) in Istanbul, Türkiye. In this study, we assessed the predictive ability of socio-economic, socio-demographic, and housing conditions on the household-level social vulnerability through machine learning models. We used classification and regression tree (CART), random forest (RF), support vector machine (SVM), naïve Bayes (NB), artificial neural network (ANN), k-nearest neighbours (KNNs), and logistic regression to classify households with respect to their social vulnerability level, which was used as the outcome of these models. Due to the disparity of class size outcome variables, subsampling strategies were applied for dealing with imbalanced data. Among these models, ANN was found to have the optimal predictive performance for discriminating households with low and high social vulnerability when random-majority under sampling was applied (area under the curve (AUC): 0.813). The results from the ANN method indicated that lack of social security, living in a squatter house, and job insecurity were among the most important predictors of social vulnerability to hazards. Additionally, the level of education, the ratio of elderly persons in the household, owning a property, household size, ratio of income earners, and savings of the household were found to be associated with social vulnerability. An open-access R Shiny web application was developed to visually display the performance of machine learning (ML) methods, important variables for the classification of households with high and low social vulnerability, and the spatial distribution of the variables across Istanbul neighbourhoods. The machine learning methodology and the findings that we present in this paper can guide decision makers in identifying social vulnerability effectively and hence let them prioritise actions towards vulnerable groups in terms of needs prior to an event of a hazard.\n","PeriodicalId":18922,"journal":{"name":"Natural Hazards and Earth System Sciences","volume":" ","pages":""},"PeriodicalIF":4.7000,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Hazards and Earth System Sciences","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.5194/nhess-23-2133-2023","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 1

Abstract

Abstract. To what extent an individual or group will be affected by the damage of a hazard depends not just on their exposure to the event but on their social vulnerability – that is, how well they are able to anticipate, cope with, resist, and recover from the impact of a hazard. Therefore, for mitigating disaster risk effectively and building a disaster-resilient society to natural hazards, it is essential that policy makers develop an understanding of social vulnerability. This study aims to propose an optimal predictive model that allows decision makers to identify households with high social vulnerability by using a number of easily accessible household variables. In order to develop such a model, we rely on a large dataset comprising a household survey (n = 41 093) that was conducted to generate a social vulnerability index (SoVI) in Istanbul, Türkiye. In this study, we assessed the predictive ability of socio-economic, socio-demographic, and housing conditions on the household-level social vulnerability through machine learning models. We used classification and regression tree (CART), random forest (RF), support vector machine (SVM), naïve Bayes (NB), artificial neural network (ANN), k-nearest neighbours (KNNs), and logistic regression to classify households with respect to their social vulnerability level, which was used as the outcome of these models. Due to the disparity of class size outcome variables, subsampling strategies were applied for dealing with imbalanced data. Among these models, ANN was found to have the optimal predictive performance for discriminating households with low and high social vulnerability when random-majority under sampling was applied (area under the curve (AUC): 0.813). The results from the ANN method indicated that lack of social security, living in a squatter house, and job insecurity were among the most important predictors of social vulnerability to hazards. Additionally, the level of education, the ratio of elderly persons in the household, owning a property, household size, ratio of income earners, and savings of the household were found to be associated with social vulnerability. An open-access R Shiny web application was developed to visually display the performance of machine learning (ML) methods, important variables for the classification of households with high and low social vulnerability, and the spatial distribution of the variables across Istanbul neighbourhoods. The machine learning methodology and the findings that we present in this paper can guide decision makers in identifying social vulnerability effectively and hence let them prioritise actions towards vulnerable groups in terms of needs prior to an event of a hazard.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在发生危险时，使用机器学习算法识别社会脆弱性的预测因素:伊斯坦布尔案例研究

摘要个人或群体将在多大程度上受到危害的影响，不仅取决于他们对事件的暴露程度，还取决于他们的社会脆弱性——也就是说，他们能够预测、应对、抵抗和从危害的影响中恢复的能力。因此，为了有效降低灾害风险，建设一个抵御自然灾害的抗灾社会，决策者必须了解社会脆弱性。这项研究旨在提出一个最佳预测模型，使决策者能够通过使用一些易于获取的家庭变量来识别具有高度社会脆弱性的家庭。为了开发这样一个模型，我们依赖于一个大型数据集，该数据集包括一个家庭调查（n = 41 093），该研究是在土耳其伊斯坦布尔为生成社会脆弱性指数（SoVI）而进行的。在这项研究中，我们通过机器学习模型评估了社会经济、社会人口和住房条件对家庭层面社会脆弱性的预测能力。我们使用分类和回归树（CART）、随机森林（RF）、支持向量机（SVM）、朴素贝叶斯（NB）、人工神经网络（ANN）、k近邻（KNN）和逻辑回归来根据家庭的社会脆弱性水平对其进行分类，这被用作这些模型的结果。由于班级规模结果变量的差异，采用了二次抽样策略来处理不平衡的数据。在这些模型中，当应用随机多数抽样时，ANN在区分社会脆弱性低和高的家庭方面具有最佳的预测性能（曲线下面积（AUC）：0.813）。ANN方法的结果表明，缺乏社会保障、住在棚户区、，工作不安全感是社会易受危害的最重要预测因素之一。此外，研究发现，教育水平、家庭中老年人的比例、拥有房产、家庭规模、收入者的比例和家庭储蓄与社会脆弱性有关。开发了一个开放访问的R Shiny web应用程序，以直观地显示机器学习（ML）方法的性能、社会脆弱性高和低家庭分类的重要变量，以及变量在伊斯坦布尔社区的空间分布。我们在本文中提出的机器学习方法和发现可以指导决策者有效识别社会脆弱性，从而使他们在发生危险事件之前根据需求优先考虑针对弱势群体的行动。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Natural Hazards and Earth System Sciences 地学-地球科学综合

CiteScore

7.60

自引率

6.50%

发文量

192

审稿时长

3.8 months

期刊介绍： Natural Hazards and Earth System Sciences (NHESS) is an interdisciplinary and international journal dedicated to the public discussion and open-access publication of high-quality studies and original research on natural hazards and their consequences. Embracing a holistic Earth system science approach, NHESS serves a wide and diverse community of research scientists, practitioners, and decision makers concerned with detection of natural hazards, monitoring and modelling, vulnerability and risk assessment, and the design and implementation of mitigation and adaptation strategies, including economical, societal, and educational aspects.