Towards robust validation strategies for EO flood maps

IF 11.4 1区地球科学 Q1 ENVIRONMENTAL SCIENCES Remote Sensing of Environment Pub Date : 2024-10-03 DOI:10.1016/j.rse.2024.114439

Tim Landwehr , Antara Dasgupta , Björn Waske

{"title":"Towards robust validation strategies for EO flood maps","authors":"Tim Landwehr , Antara Dasgupta , Björn Waske","doi":"10.1016/j.rse.2024.114439","DOIUrl":null,"url":null,"abstract":"<div><div>Flood maps based on Earth Observation (EO) data inform critical decision-making in almost every stage of the disaster management cycle, directly impacting the ability of affected individuals and governments to receive aid as well as informing policies on future adaptation. However, flood map validation also presents a challenge in the form of class imbalance between flood and non-flood classes, which has rarely been investigated. There are currently no established best practices for addressing this issue, and the accuracy of these maps is often viewed as a mere formality, which leads to a lack of user trust in flood map products and a limitation in their operational use and uptake. This paper provides the first comprehensive assessment of the impact of current EO-based flood map validation practices. Using flood inundation maps derived from Sentinel-1 synthetic aperture radar data with synthetically generated controlled errors and Copernicus Emergency Management Service flood maps as the ground truth, binary metrics were statistically evaluated for the quantification of flood detection accuracy for events under varying flood conditions. Especially, class specific metrics were found to be sensitive to the class imbalance, i.e. larger flood magnitudes result in higher metric scores, thus being naturally biased towards overpredicting classifiers. Metric stability across error percentiles and flood magnitudes was assessed through standard deviation calculated by bootstrapping to quantify the impact of sample selection subjectivity, where stratified sampling schemes exhibited the lowest standard deviation consistently. Thoughtful sample and response design were critical, with probability-based random sampling and proportional or equal class allocation vital to producing robust accuracy estimates comparable across study sites, error classes, and flood magnitudes. Results suggest that popular evaluation metrics such as the F1-Score are in fact unsuitable for accurate characterization of map quality and are not comparable across different study sites or events. Overall accuracy and MCC are shown to be the most robust performance metrics when sampling designs are optimized, and bootstrapping is demonstrated to be a necessary tool for estimating variability in map accuracy observed due to the spatial sampling of validation points. Results presented herein pave the way for the development of global flood map validation guidelines, to support wider use of and trust in EO-derived flood risk and recovery products, eventually allowing us to unlock the full potential of EO for improved flood resilience.</div></div>","PeriodicalId":417,"journal":{"name":"Remote Sensing of Environment","volume":"315 ","pages":"Article 114439"},"PeriodicalIF":11.4000,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Remote Sensing of Environment","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0034425724004656","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Flood maps based on Earth Observation (EO) data inform critical decision-making in almost every stage of the disaster management cycle, directly impacting the ability of affected individuals and governments to receive aid as well as informing policies on future adaptation. However, flood map validation also presents a challenge in the form of class imbalance between flood and non-flood classes, which has rarely been investigated. There are currently no established best practices for addressing this issue, and the accuracy of these maps is often viewed as a mere formality, which leads to a lack of user trust in flood map products and a limitation in their operational use and uptake. This paper provides the first comprehensive assessment of the impact of current EO-based flood map validation practices. Using flood inundation maps derived from Sentinel-1 synthetic aperture radar data with synthetically generated controlled errors and Copernicus Emergency Management Service flood maps as the ground truth, binary metrics were statistically evaluated for the quantification of flood detection accuracy for events under varying flood conditions. Especially, class specific metrics were found to be sensitive to the class imbalance, i.e. larger flood magnitudes result in higher metric scores, thus being naturally biased towards overpredicting classifiers. Metric stability across error percentiles and flood magnitudes was assessed through standard deviation calculated by bootstrapping to quantify the impact of sample selection subjectivity, where stratified sampling schemes exhibited the lowest standard deviation consistently. Thoughtful sample and response design were critical, with probability-based random sampling and proportional or equal class allocation vital to producing robust accuracy estimates comparable across study sites, error classes, and flood magnitudes. Results suggest that popular evaluation metrics such as the F1-Score are in fact unsuitable for accurate characterization of map quality and are not comparable across different study sites or events. Overall accuracy and MCC are shown to be the most robust performance metrics when sampling designs are optimized, and bootstrapping is demonstrated to be a necessary tool for estimating variability in map accuracy observed due to the spatial sampling of validation points. Results presented herein pave the way for the development of global flood map validation guidelines, to support wider use of and trust in EO-derived flood risk and recovery products, eventually allowing us to unlock the full potential of EO for improved flood resilience.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

为地球观测洪水地图制定稳健的验证策略

基于地球观测（EO）数据的洪水地图为灾害管理周期几乎每个阶段的关键决策提供信息，直接影响受灾个人和政府接受援助的能力，并为未来适应政策提供信息。然而，洪水地图验证也面临着一个挑战，即洪水等级与非洪水等级之间的不平衡，而这一问题很少得到研究。目前还没有既定的最佳实践来解决这一问题，这些地图的准确性往往被视为一种形式，导致用户对洪水地图产品缺乏信任，限制了其实际使用和吸收。本文首次全面评估了当前基于 EO 的洪水地图验证实践的影响。使用从哨兵-1 号合成孔径雷达数据中提取的洪水淹没图，并以合成生成的受控误差和哥白尼应急管理服务洪水地图作为地面实况，对二进制指标进行了统计评估，以量化在不同洪水条件下的洪水检测精度。特别是，研究发现，针对具体类别的指标对类别不平衡很敏感，即洪水量级越大，指标得分越高，因此自然会偏向于预测过高的分类器。不同误差百分位数和洪水量级的指标稳定性通过自举法计算的标准偏差进行评估，以量化样本选择主观性的影响，其中分层抽样方案的标准偏差始终最低。深思熟虑的样本和响应设计至关重要，基于概率的随机抽样和比例或等分级分配对于在不同研究地点、误差等级和洪水量级之间产生可比较的稳健准确性估计至关重要。结果表明，F1 分数等流行的评估指标实际上并不适合用于准确描述地图质量，而且在不同研究地点或事件之间也不具有可比性。结果表明，在优化采样设计时，总体精度和 MCC 是最稳健的性能指标，而且自举法是估算验证点空间采样导致的地图精度变化的必要工具。本文介绍的结果为制定全球洪水地图验证指南铺平了道路，以支持更广泛地使用和信任源自 EO 的洪水风险和恢复产品，最终使我们能够释放 EO 的全部潜力，提高抗洪能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Remote Sensing of Environment 环境科学-成像科学与照相技术

CiteScore

25.10

自引率

8.90%

发文量

455

审稿时长

53 days

期刊介绍： Remote Sensing of Environment (RSE) serves the Earth observation community by disseminating results on the theory, science, applications, and technology that contribute to advancing the field of remote sensing. With a thoroughly interdisciplinary approach, RSE encompasses terrestrial, oceanic, and atmospheric sensing. The journal emphasizes biophysical and quantitative approaches to remote sensing at local to global scales, covering a diverse range of applications and techniques. RSE serves as a vital platform for the exchange of knowledge and advancements in the dynamic field of remote sensing.