Earth System Science Data最新文献_第8页

Optimal feature selection for improved ML based reconstruction of Global Terrestrial Water Storage Anomalies 优化特征选择，改进基于 ML 的全球陆地蓄水异常重建方法

IF 11.4 1区地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY

Earth System Science Data

Pub Date : 2024-05-17 DOI: 10.5194/essd-2024-109

Nehar Mandal, Prabal Das, Kironmala Chanda

Abstract. Understanding long-term Terrestrial water storage (TWS) variations is vital for investigating hydrological extreme events, managing water resources, and assessing climate change impacts. However, the limited data duration from the Gravity Recovery and Climate Experiment (GRACE) and its follow-on missions (GRACE-FO) poses challenges for comprehensive long-term analysis. In this study, we reconstruct TWS anomalies (TWSA) for the period Jan 1960 to Dec 2022 thereby filling data gaps between GRACE and GRACE-FO missions as well as generating a complete dataset for the pre-GRACE era. The workflow involves identifying optimal predictors from land surface model (LSM) outputs, meteorological variables, and climatic indices using a novel Bayesian Network (BN) technique for grid-based TWSA simulations. Climate indices, like the Oceanic Niño Index and Dipole Mode Index, are selected as optimal predictors for a large number of grids globally, along with TWSA from LSM outputs. The most effective machine learning (ML) algorithms among Convolutional Neural Network (CNN), Support Vector Regression (SVR), Extra Trees Regressor (ETR), and Stacking Ensemble Regression (SER) models are evaluated at each grid location to achieve optimal reproducibility. Globally, ETR performs best for most of the grids which is also noticed at the river-basin scale, particularly for the Ganga-Brahmaputra-Meghana, Godavari, Krishna, Limpopo, and Nile river basins. The simulated TWSA (BNML_TWSA) outperformed the TWSA from LSM outputs when evaluated against GRACE datasets. Improvements are particularly noted in the river basins such as Godavari, Krishna, Danube, Amazon, etc., with median values of the correlation coefficient, Nash-Sutcliffe efficiency, and RMSE for all grids in Godavari, India, being 0.927, 0.839, and 63.7 mm respectively. A comparison with TWSA reconstructed in recent studies indicates that the proposed BNML_TWSA outperforms them globally as well as for all the 11 major river basins examined. The presented dataset is published at https://doi.org/10.6084/m9.figshare.25376695 (Mandal et al., 2024) and updates will be published when needed.

摘要了解陆地储水量（TWS）的长期变化对于调查水文极端事件、管理水资源和评估气候变化影响至关重要。然而，重力恢复与气候实验（GRACE）及其后续任务（GRACE-FO）的数据持续时间有限，给全面的长期分析带来了挑战。在这项研究中，我们重建了 1960 年 1 月至 2022 年 12 月期间的 TWS 异常（TWSA），从而填补了 GRACE 和 GRACE-FO 任务之间的数据空白，并生成了前 GRACE 时代的完整数据集。工作流程包括使用一种基于网格的 TWSA 模拟的新型贝叶斯网络（BN）技术，从陆地表面模式（LSM）输出、气象变量和气候指数中确定最佳预测因子。气候指数，如海洋尼诺指数和偶极模式指数，被选为全球大量网格的最佳预测因子，并与来自 LSM 输出的 TWSA 一起使用。在每个网格位置评估了卷积神经网络（CNN）、支持向量回归（SVR）、额外树回归（ETR）和堆叠集合回归（SER）模型中最有效的机器学习（ML）算法，以实现最佳可重复性。在全球范围内，ETR 在大多数网格中表现最佳，这一点在流域尺度上也同样明显，尤其是在恒河-rahmaputra-Meghana 河、戈达瓦里河、克里希纳河、林波波河和尼罗河流域。根据 GRACE 数据集进行评估时，模拟的 TWSA（BNML_TWSA）优于 LSM 输出的 TWSA。在戈达瓦里、克里希纳、多瑙河、亚马逊河等流域的改进尤为明显，印度戈达瓦里所有网格的相关系数、Nash-Sutcliffe 效率和 RMSE 的中值分别为 0.927、0.839 和 63.7 毫米。与近期研究中重建的 TWSA 比较表明，所提出的 BNML_TWSA 在总体上以及在所考察的所有 11 个主要流域中都优于它们。所提供的数据集已发布在 https://doi.org/10.6084/m9.figshare.25376695 上（Mandal 等，2024 年），并将在需要时发布更新。

{"title":"Optimal feature selection for improved ML based reconstruction of Global Terrestrial Water Storage Anomalies","authors":"Nehar Mandal, Prabal Das, Kironmala Chanda","doi":"10.5194/essd-2024-109","DOIUrl":"https://doi.org/10.5194/essd-2024-109","url":null,"abstract":"Abstract. Understanding long-term Terrestrial water storage (TWS) variations is vital for investigating hydrological extreme events, managing water resources, and assessing climate change impacts. However, the limited data duration from the Gravity Recovery and Climate Experiment (GRACE) and its follow-on missions (GRACE-FO) poses challenges for comprehensive long-term analysis. In this study, we reconstruct TWS anomalies (TWSA) for the period Jan 1960 to Dec 2022 thereby filling data gaps between GRACE and GRACE-FO missions as well as generating a complete dataset for the pre-GRACE era. The workflow involves identifying optimal predictors from land surface model (LSM) outputs, meteorological variables, and climatic indices using a novel Bayesian Network (BN) technique for grid-based TWSA simulations. Climate indices, like the Oceanic Niño Index and Dipole Mode Index, are selected as optimal predictors for a large number of grids globally, along with TWSA from LSM outputs. The most effective machine learning (ML) algorithms among Convolutional Neural Network (CNN), Support Vector Regression (SVR), Extra Trees Regressor (ETR), and Stacking Ensemble Regression (SER) models are evaluated at each grid location to achieve optimal reproducibility. Globally, ETR performs best for most of the grids which is also noticed at the river-basin scale, particularly for the Ganga-Brahmaputra-Meghana, Godavari, Krishna, Limpopo, and Nile river basins. The simulated TWSA (BNML_TWSA) outperformed the TWSA from LSM outputs when evaluated against GRACE datasets. Improvements are particularly noted in the river basins such as Godavari, Krishna, Danube, Amazon, etc., with median values of the correlation coefficient, Nash-Sutcliffe efficiency, and RMSE for all grids in Godavari, India, being 0.927, 0.839, and 63.7 mm respectively. A comparison with TWSA reconstructed in recent studies indicates that the proposed BNML_TWSA outperforms them globally as well as for all the 11 major river basins examined. The presented dataset is published at https://doi.org/10.6084/m9.figshare.25376695 (Mandal et al., 2024) and updates will be published when needed.","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"1 1","pages":""},"PeriodicalIF":11.4,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140953595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Brazilian Atmospheric Inventories – BRAIN: a comprehensive database of air quality in Brazil 巴西大气清单 - BRAIN：巴西空气质量综合数据库

IF 11.4 1区地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY

Earth System Science Data

Pub Date : 2024-05-16 DOI: 10.5194/essd-16-2385-2024

Leonardo Hoinaski, Robson Will, Camilo Bastos Ribeiro

Abstract. Developing air quality management systems to control the impacts of air pollution requires reliable data. However, current initiatives do not provide datasets with large spatial and temporal resolutions for developing air pollution policies in Brazil. Here, we introduce the Brazilian Atmospheric Inventories (BRAIN), the first comprehensive database of air quality and its drivers in Brazil. BRAIN encompasses hourly datasets of meteorology, emissions, and air quality. The emissions dataset includes vehicular emissions derived from the Brazilian Vehicular Emissions Inventory Software (BRAVES), industrial emissions produced with local data from the Brazilian environmental agencies, biomass burning emissions from FINN – Fire INventory from the National Center for Atmospheric Research (NCAR), and biogenic emissions from the Model of Emissions of Gases and Aerosols from Nature (MEGAN) (https://doi.org/10.57760/sciencedb.09858, Hoinaski et al., 2023a; https://doi.org/10.57760/sciencedb.09886, Hoinaski et al., 2023b). The meteorology dataset has been derived from the Weather Research and Forecasting Model (WRF) (https://doi.org/10.57760/sciencedb.09857, Hoinaski and Will, 2023a; https://doi.org/10.57760/sciencedb.09885, Hoinaski and Will, 2023c). The air quality dataset contains the surface concentration of 216 air pollutants produced from coupling meteorological and emissions datasets with the Community Multiscale Air Quality Modeling System (CMAQ) (https://doi.org/10.57760/sciencedb.09859, Hoinaski and Will, 2023b; https://doi.org/10.57760/sciencedb.09884, Hoinaski and Will, 2023d). We provide gridded data in two domains, one covering the Brazilian territory with 20×20 km spatial resolution and another covering southern Brazil with 4×4 km spatial resolution. This paper describes how the datasets were produced, their limitations, and their spatiotemporal features. To evaluate the quality of the database, we compare the air quality dataset with 244 air quality monitoring stations, providing the model's performance for each pollutant measured by the monitoring stations. We present a sample of the spatial variability of emissions, meteorology, and air quality in Brazil from 2019, revealing the hotspots of emissions and air pollution issues. By making BRAIN publicly available, we aim to provide the required data for developing air quality policies on municipal and state scales, especially for under-developed and data-scarce municipalities. We also envision that BRAIN has the potential to create new insights into and opportunities for air pollution research in Brazil.

摘要开发空气质量管理系统以控制空气污染的影响需要可靠的数据。然而，目前的举措并没有为巴西制定空气污染政策提供大时空分辨率的数据集。在此，我们介绍巴西大气清单（BRAIN），这是巴西首个关于空气质量及其驱动因素的综合数据库。BRAIN 包含气象、排放和空气质量的每小时数据集。排放数据集包括巴西车辆排放清单软件（BRAVES）中的车辆排放、巴西环境机构根据当地数据生成的工业排放、国家大气研究中心（NCAR）的 FINN - Fire INventory 中的生物质燃烧排放，以及自然界气体和气溶胶排放模型（MEGAN）中的生物排放（https://doi.org/10.57760/sciencedb.09858, Hoinaski et al., 2023a；https://doi.org/10.57760/sciencedb.09886, Hoinaski et al., 2023b）。气象数据集来自天气研究和预测模型（WRF）（https://doi.org/10.57760/sciencedb.09857，Hoinaski 和 Will，2023a；https://doi.org/10.57760/sciencedb.09885，Hoinaski 和 Will，2023c）。空气质量数据集包含 216 种空气污染物的地表浓度，这些数据集是将气象和排放数据集与社区多尺度空气质量建模系统（CMAQ）（https://doi.org/10.57760/sciencedb.09859, Hoinaski and Will, 2023b；https://doi.org/10.57760/sciencedb.09884, Hoinaski and Will, 2023d）耦合后生成的。我们提供了两个域的网格数据，一个域覆盖巴西全境，空间分辨率为 20×20 千米，另一个域覆盖巴西南部，空间分辨率为 4×4 千米。本文介绍了数据集的制作方法、局限性及其时空特征。为了评估数据库的质量，我们将空气质量数据集与 244 个空气质量监测站进行了比较，提供了监测站测量的每种污染物的模型性能。我们展示了 2019 年巴西排放、气象和空气质量的空间变化样本，揭示了排放和空气污染问题的热点。通过公开 BRAIN，我们旨在为制定市、州范围的空气质量政策提供所需的数据，尤其是针对欠发达和数据稀缺的城市。我们还设想，BRAIN 有可能为巴西的空气污染研究提供新的见解和机会。

{"title":"Brazilian Atmospheric Inventories – BRAIN: a comprehensive database of air quality in Brazil","authors":"Leonardo Hoinaski, Robson Will, Camilo Bastos Ribeiro","doi":"10.5194/essd-16-2385-2024","DOIUrl":"https://doi.org/10.5194/essd-16-2385-2024","url":null,"abstract":"Abstract. Developing air quality management systems to control the impacts of air pollution requires reliable data. However, current initiatives do not provide datasets with large spatial and temporal resolutions for developing air pollution policies in Brazil. Here, we introduce the Brazilian Atmospheric Inventories (BRAIN), the first comprehensive database of air quality and its drivers in Brazil. BRAIN encompasses hourly datasets of meteorology, emissions, and air quality. The emissions dataset includes vehicular emissions derived from the Brazilian Vehicular Emissions Inventory Software (BRAVES), industrial emissions produced with local data from the Brazilian environmental agencies, biomass burning emissions from FINN – Fire INventory from the National Center for Atmospheric Research (NCAR), and biogenic emissions from the Model of Emissions of Gases and Aerosols from Nature (MEGAN) (https://doi.org/10.57760/sciencedb.09858, Hoinaski et al., 2023a; https://doi.org/10.57760/sciencedb.09886, Hoinaski et al., 2023b). The meteorology dataset has been derived from the Weather Research and Forecasting Model (WRF) (https://doi.org/10.57760/sciencedb.09857, Hoinaski and Will, 2023a; https://doi.org/10.57760/sciencedb.09885, Hoinaski and Will, 2023c). The air quality dataset contains the surface concentration of 216 air pollutants produced from coupling meteorological and emissions datasets with the Community Multiscale Air Quality Modeling System (CMAQ) (https://doi.org/10.57760/sciencedb.09859, Hoinaski and Will, 2023b; https://doi.org/10.57760/sciencedb.09884, Hoinaski and Will, 2023d). We provide gridded data in two domains, one covering the Brazilian territory with 20×20 km spatial resolution and another covering southern Brazil with 4×4 km spatial resolution. This paper describes how the datasets were produced, their limitations, and their spatiotemporal features. To evaluate the quality of the database, we compare the air quality dataset with 244 air quality monitoring stations, providing the model's performance for each pollutant measured by the monitoring stations. We present a sample of the spatial variability of emissions, meteorology, and air quality in Brazil from 2019, revealing the hotspots of emissions and air pollution issues. By making BRAIN publicly available, we aim to provide the required data for developing air quality policies on municipal and state scales, especially for under-developed and data-scarce municipalities. We also envision that BRAIN has the potential to create new insights into and opportunities for air pollution research in Brazil.","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"48 1","pages":""},"PeriodicalIF":11.4,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140949419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

European topsoil bulk density and organic carbon stock database (0–20 cm) using machine-learning-based pedotransfer functions 利用基于机器学习的 pedotransfer 功能，建立欧洲表土容重和有机碳储量数据库（0-20 厘米

IF 11.4 1区地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY

Earth System Science Data

Pub Date : 2024-05-16 DOI: 10.5194/essd-16-2367-2024

Songchao Chen, Zhongxing Chen, Xianglin Zhang, Zhongkui Luo, Calogero Schillaci, Dominique Arrouays, Anne Christine Richer-de-Forges, Zhou Shi

Abstract. Soil bulk density (BD) serves as a fundamental indicator of soil health and quality, exerting a significant influence on critical factors such as plant growth, nutrient availability, and water retention. Due to its limited availability in soil databases, the application of pedotransfer functions (PTFs) has emerged as a potent tool for predicting BD using other easily measurable soil properties, while the impact of these PTFs' performance on soil organic carbon (SOC) stock calculation has been rarely explored. In this study, we proposed an innovative local modeling approach for predicting BD of fine earth (BDfine) across Europe using the recently released BDfine data from the LUCAS Soil (Land Use and Coverage Area Frame Survey Soil) 2018 (0–20 cm) and relevant predictors. Our approach involved a combination of neighbor sample search, forward recursive feature selection (FRFS), and random forest (RF) models (local-RFFRFS). The results showed that local-RFFRFS had a good performance in predicting BDfine (R2 of 0.58, root mean square error (RMSE) of 0.19 g cm−3, relative error (RE) of 16.27 %), surpassing the earlier-published PTFs (R2 of 0.40–0.45, RMSE of 0.22 g cm−3, RE of 19.11 %–21.18 %) and global PTFs using RF models with and without FRFS (R2 of 0.56–0.57, RMSE of 0.19 g cm−3, RE of 16.47 %–16.74 %). Interestingly, we found that the best earlier-published PTF (R2 = 0.84, RMSE = 1.39 kg m−2, RE of 17.57 %) performed close to the local-RFFRFS (R2 = 0.85, RMSE = 1.32 kg m−2, RE of 15.01 %) in SOC stock calculation using BDfine predictions. However, the local-RFFRFS still performed better (ΔR2 > 0.2) for soil samples with low SOC stocks (< 3 kg m−2). Therefore, we suggest that the local-RFFRFS is a promising method for BDfine prediction, while earlier-published PTFs would be more efficient when BDfine is subsequently utilized for calculating SOC stock. Finally, we produced two topsoil BDfine and SOC stock datasets (18 945 and 15 389 soil samples) at 0–20 cm for LUCAS Soil 2018 using the best earlier-published PTF and local-RFFRFS, respectively. This dataset is archived on the Zenodo platform at https://doi.org/10.5281/zenodo.10211884 (S. Chen et al., 2023). The outcomes of this study present a meaningful advancement in enhancing the predictive accuracy of BDfine, and the resultant BDfine and SOC stock datasets for topsoil across the Europe enable more precise soil hydrological and biological modeling.

摘要土壤容重（BD）是土壤健康和质量的基本指标，对植物生长、养分供应和保水性等关键因素具有重要影响。由于其在土壤数据库中的可用性有限， pedotransfer 函数（PTFs）的应用已成为利用其他易于测量的土壤特性预测土壤容重的有效工具，而这些 PTFs 的性能对土壤有机碳（SOC）储量计算的影响却很少被探讨。在本研究中，我们提出了一种创新的局部建模方法，利用最近发布的 2018 年 LUCAS 土壤（土地利用和覆盖区框架调查土壤）（0-20 厘米）BDfine 数据和相关预测因子，预测整个欧洲的细土 BD（BDfine）。我们的方法结合了邻近样本搜索、前向递归特征选择（FRFS）和随机森林（RF）模型（local-RFFRFS）。结果表明，局部-RFFRFS 在预测 BDfine 方面表现良好（R2 为 0.58，均方根误差 (RMSE) 为 0.19 g cm-3，相对误差 (RE) 为 16.27 %），超过了早期发表的 PTFs（R2 为 0.40-0.45，RMSE 为 0.22 g cm-3，RE 为 19.11 %-21.18 %）和使用有 FRFS 和无 FRFS 射频模型的全球 PTFs（R2 为 0.56-0.57，RMSE 为 0.19 g cm-3，RE 为 16.47 %-16.74%）。有趣的是，我们发现早期发表的最佳 PTF（R2 = 0.84，均方根误差 = 1.39 kg m-2，RE 为 17.57 %）在使用 BDfine 预测计算 SOC 储量时的表现接近于本地-RFFRFS（R2 = 0.85，均方根误差 = 1.32 kg m-2，RE 为 15.01 %）。不过，对于 SOC 储量较低（< 3 kg m-2）的土壤样本，本地 RFFRFS 的表现仍然更好（ΔR2 > 0.2）。因此，我们认为本地-RFFRFS 是一种很有前途的 BDfine 预测方法，而在随后利用 BDfine 计算 SOC 储量时，早期发表的 PTF 将更为有效。最后，我们为 LUCAS Soil 2018 制作了两个 0-20 厘米表土 BDfine 和 SOC 储量数据集（18 945 和 15 389 个土壤样本），分别使用了早期发布的最佳 PTF 和 local-RFFRFS。该数据集在 Zenodo 平台上存档，网址为 https://doi.org/10.5281/zenodo.10211884（S. Chen 等，2023 年）。这项研究的成果在提高 BDfine 预测准确性方面取得了重大进展，由此产生的欧洲表层土壤 BDfine 和 SOC 储量数据集可用于更精确的土壤水文和生物建模。

{"title":"European topsoil bulk density and organic carbon stock database (0–20 cm) using machine-learning-based pedotransfer functions","authors":"Songchao Chen, Zhongxing Chen, Xianglin Zhang, Zhongkui Luo, Calogero Schillaci, Dominique Arrouays, Anne Christine Richer-de-Forges, Zhou Shi","doi":"10.5194/essd-16-2367-2024","DOIUrl":"https://doi.org/10.5194/essd-16-2367-2024","url":null,"abstract":"Abstract. Soil bulk density (BD) serves as a fundamental indicator of soil health and quality, exerting a significant influence on critical factors such as plant growth, nutrient availability, and water retention. Due to its limited availability in soil databases, the application of pedotransfer functions (PTFs) has emerged as a potent tool for predicting BD using other easily measurable soil properties, while the impact of these PTFs' performance on soil organic carbon (SOC) stock calculation has been rarely explored. In this study, we proposed an innovative local modeling approach for predicting BD of fine earth (BDfine) across Europe using the recently released BDfine data from the LUCAS Soil (Land Use and Coverage Area Frame Survey Soil) 2018 (0–20 cm) and relevant predictors. Our approach involved a combination of neighbor sample search, forward recursive feature selection (FRFS), and random forest (RF) models (local-RFFRFS). The results showed that local-RFFRFS had a good performance in predicting BDfine (R2 of 0.58, root mean square error (RMSE) of 0.19 g cm−3, relative error (RE) of 16.27 %), surpassing the earlier-published PTFs (R2 of 0.40–0.45, RMSE of 0.22 g cm−3, RE of 19.11 %–21.18 %) and global PTFs using RF models with and without FRFS (R2 of 0.56–0.57, RMSE of 0.19 g cm−3, RE of 16.47 %–16.74 %). Interestingly, we found that the best earlier-published PTF (R2 = 0.84, RMSE = 1.39 kg m−2, RE of 17.57 %) performed close to the local-RFFRFS (R2 = 0.85, RMSE = 1.32 kg m−2, RE of 15.01 %) in SOC stock calculation using BDfine predictions. However, the local-RFFRFS still performed better (ΔR2 > 0.2) for soil samples with low SOC stocks (< 3 kg m−2). Therefore, we suggest that the local-RFFRFS is a promising method for BDfine prediction, while earlier-published PTFs would be more efficient when BDfine is subsequently utilized for calculating SOC stock. Finally, we produced two topsoil BDfine and SOC stock datasets (18 945 and 15 389 soil samples) at 0–20 cm for LUCAS Soil 2018 using the best earlier-published PTF and local-RFFRFS, respectively. This dataset is archived on the Zenodo platform at https://doi.org/10.5281/zenodo.10211884 (S. Chen et al., 2023). The outcomes of this study present a meaningful advancement in enhancing the predictive accuracy of BDfine, and the resultant BDfine and SOC stock datasets for topsoil across the Europe enable more precise soil hydrological and biological modeling.","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"24 1","pages":""},"PeriodicalIF":11.4,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140949353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Two sets of bias-corrected regional UK Climate Projections 2018 (UKCP18) of temperature, precipitation and potential evapotranspiration for Great Britain 两套经过偏差校正的 2018 年英国区域气候预测（UKCP18），包括英国的气温、降水和潜在蒸散量。

IF 11.4 1区地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY

Earth System Science Data

Pub Date : 2024-05-16 DOI: 10.5194/essd-2024-132

Nele Reyniers, Qianyu Zha, Nans Addor, Timothy J. Osborn, Nicole Forstenhäusler, Yi He

Abstract. The United Kingdom Climate Projections 2018 (UKCP18) regional climate model (RCM) 12 km regional perturbed physics ensemble (UKCP18-RCM-PPE) is one of the three strands of the latest set of UK national climate projections produced by the UK Met Office. It has been widely adopted in climate impact assessment. In this study, we report biases in the raw UKCP18-RCM simulations that are significant and are likely to deteriorate impact assessments if they are not adjusted. Two methods were used to bias-correct UKCP18-RCM: non-parametric quantile mapping using empirical quantiles and a variant developed for the third phase of the Inter-Sectoral Impact Model Intercomparison Project (ISIMIP) designed to preserve the climate change signal. Specifically, daily temperature and precipitation simulations for 1981 to 2080 were adjusted for the 12 ensemble members. Potential evapotranspiration was also estimated over the same period using the Penman-Monteith formulation and then bias-corrected using the latter method. Both methods successfully corrected biases in a range of daily temperature, precipitation and potential evapotranspiration metrics, and reduced biases in multi-day precipitation metrics to a lesser degree. An exploratory analysis of the projected future changes confirms the expectation of wetter, warmer winters and hotter, drier summers, and shows uneven changes in different parts of the distributions of both temperature and precipitation. Both bias-correction methods preserved the climate change signal almost equally well, as well as the spread among the projected changes. The change factor method was used as a benchmark for precipitation, and we show that it fails to capture changes in a range of variables, making it inadequate for most impact assessments. By comparing the differences between the two bias-correction methods and within the 12 ensemble members, we show that the uncertainty in future precipitation and temperature changes stemming from the climate model parameterisation far outweighs the uncertainty introduced by selecting one of these two bias-correction methods. We conclude by providing guidance on the use of the bias-corrected data sets. The data sets bias adjusted with ISIMIP3BA are publicly available in the following repositories: https://doi.org/10.5281/zenodo.6337381 for precipitation and temperature (Reyniers et al., 2022a) and https://doi.org/10.5281/zenodo.6320707 for potential evapotranspiration (Reyniers et al., 2022b) . The datasets bias-corrected using the quantile mapping method are available at https://doi.org/10.5281/zenodo.8223024 (Zha et al., 2023) .

摘要英国气候预测 2018（UKCP18）区域气候模式（RCM）12 公里区域扰动物理集合（UKCP18-RCM-PPE）是英国气象局制作的最新一套英国国家气候预测的三个部分之一。它已被广泛用于气候影响评估。在本研究中，我们报告了原始 UKCP18-RCM 模拟中存在的严重偏差，如果不对这些偏差进行调整，影响评估可能会恶化。我们使用了两种方法对 UKCP18-RCM 进行偏差校正：一种是使用经验量值的非参数量值映射，另一种是为部门间影响模式相互比较项目（ISIMIP）第三阶段开发的旨在保留气候变化信号的变体。具体而言，对 1981 年至 2080 年的日气温和降水模拟进行了调整。在同一时期，还使用彭曼-蒙蒂斯公式估算潜在蒸散量，然后使用后一种方法进行偏差校正。这两种方法都成功地纠正了一系列日气温、降水和潜在蒸散量指标的偏差，并在较小程度上减少了多日降水指标的偏差。对未来变化预测的探索性分析证实了冬季更潮湿、更温暖，夏季更炎热、更干燥的预期，并显示温度和降水分布的不同部分变化不均。这两种偏差校正方法几乎同样很好地保留了气候变化信号以及预测变化的分布。我们将变化因子法作为降水量的基准，结果表明该方法无法捕捉到一系列变量的变化，因此不适用于大多数影响评估。通过比较两种偏差校正方法之间以及 12 个集合成员内部的差异，我们表明，气候模式参数化带来的未来降水和气温变化的不确定性远远大于从这两种偏差校正方法中选择一种带来的不确定性。最后，我们将为偏差校正数据集的使用提供指导。使用 ISIMIP3BA 进行偏差校正的数据集可在以下资料库中公开获取：降水和温度资料库 https://doi.org/10.5281/zenodo.6337381（Reyniers 等，2022a）和潜在蒸散量资料库 https://doi.org/10.5281/zenodo.6320707（Reyniers 等，2022b）。使用量子绘图法进行偏差校正的数据集可在 https://doi.org/10.5281/zenodo.8223024 网站上查阅（Zha 等人，2023 年）。

{"title":"Two sets of bias-corrected regional UK Climate Projections 2018 (UKCP18) of temperature, precipitation and potential evapotranspiration for Great Britain","authors":"Nele Reyniers, Qianyu Zha, Nans Addor, Timothy J. Osborn, Nicole Forstenhäusler, Yi He","doi":"10.5194/essd-2024-132","DOIUrl":"https://doi.org/10.5194/essd-2024-132","url":null,"abstract":"Abstract. The United Kingdom Climate Projections 2018 (UKCP18) regional climate model (RCM) 12 km regional perturbed physics ensemble (UKCP18-RCM-PPE) is one of the three strands of the latest set of UK national climate projections produced by the UK Met Office. It has been widely adopted in climate impact assessment. In this study, we report biases in the raw UKCP18-RCM simulations that are significant and are likely to deteriorate impact assessments if they are not adjusted. Two methods were used to bias-correct UKCP18-RCM: non-parametric quantile mapping using empirical quantiles and a variant developed for the third phase of the Inter-Sectoral Impact Model Intercomparison Project (ISIMIP) designed to preserve the climate change signal. Specifically, daily temperature and precipitation simulations for 1981 to 2080 were adjusted for the 12 ensemble members. Potential evapotranspiration was also estimated over the same period using the Penman-Monteith formulation and then bias-corrected using the latter method. Both methods successfully corrected biases in a range of daily temperature, precipitation and potential evapotranspiration metrics, and reduced biases in multi-day precipitation metrics to a lesser degree. An exploratory analysis of the projected future changes confirms the expectation of wetter, warmer winters and hotter, drier summers, and shows uneven changes in different parts of the distributions of both temperature and precipitation. Both bias-correction methods preserved the climate change signal almost equally well, as well as the spread among the projected changes. The change factor method was used as a benchmark for precipitation, and we show that it fails to capture changes in a range of variables, making it inadequate for most impact assessments. By comparing the differences between the two bias-correction methods and within the 12 ensemble members, we show that the uncertainty in future precipitation and temperature changes stemming from the climate model parameterisation far outweighs the uncertainty introduced by selecting one of these two bias-correction methods. We conclude by providing guidance on the use of the bias-corrected data sets. The data sets bias adjusted with ISIMIP3BA are publicly available in the following repositories: https://doi.org/10.5281/zenodo.6337381 for precipitation and temperature (Reyniers et al., 2022a) and https://doi.org/10.5281/zenodo.6320707 for potential evapotranspiration (Reyniers et al., 2022b) . The datasets bias-corrected using the quantile mapping method are available at https://doi.org/10.5281/zenodo.8223024 (Zha et al., 2023) .","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"145 1","pages":""},"PeriodicalIF":11.4,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140949414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evapotranspiration evaluation using three different protocols on a large green roof in the greater Paris area 在大巴黎地区的一个大型绿色屋顶上使用三种不同方案进行蒸散评估

IF 11.4 1区地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY

Earth System Science Data

Pub Date : 2024-05-15 DOI: 10.5194/essd-16-2351-2024

Pierre-Antoine Versini, Leydy Alejandra Castellanos-Diaz, David Ramier, Ioulia Tchiguirinskaia

Abstract. Nature-based solutions have appeared as relevant solutions to mitigate urban heat islands. To improve our knowledge of the assessment of this ecosystem service and the related physical processes (evapotranspiration), monitoring campaigns are required. This was the objective of several experiments carried out on the Blue Green Wave, a large green roof located in Champs-sur-Marne (France). Three different protocols were implemented and tested to assess the evapotranspiration flux at different scales: the first one was based on the surface energy balance (large scale); the second one was carried out using an evapotranspiration chamber (small scale); and the third one was based on the water balance evaluated during dry periods (point scale). In addition to these evapotranspiration estimates, several hydrometeorological variables (especially temperature) were measured. Related data and Python programs providing preliminary elements of the analysis and graphical representation have been made available. They illustrate the space–time variability in the studied processes regarding their observation scale. The dataset is available at https://doi.org/10.5281/zenodo.8064053 (Versini et al., 2023).

摘要基于自然的解决方案已成为缓解城市热岛的相关解决方案。为了提高我们对这一生态系统服务和相关物理过程（蒸散作用）评估的认识，需要开展监测活动。这是在位于法国马恩河畔香榭丽舍大街的大型绿色屋顶 "蓝色绿波 "上开展的几项实验的目标。为了评估不同尺度的蒸散通量，实施并测试了三种不同的方案：第一种方案基于地表能量平衡（大尺度）；第二种方案使用蒸散室（小尺度）；第三种方案基于干旱期的水平衡评估（点尺度）。除蒸散量估算外，还测量了几个水文气象变量（尤其是温度）。相关数据和 Python 程序提供了初步的分析要素和图形表示。它们说明了所研究过程在观测尺度上的时空变化。数据集可在 https://doi.org/10.5281/zenodo.8064053（Versini 等人，2023 年）上查阅。

{"title":"Evapotranspiration evaluation using three different protocols on a large green roof in the greater Paris area","authors":"Pierre-Antoine Versini, Leydy Alejandra Castellanos-Diaz, David Ramier, Ioulia Tchiguirinskaia","doi":"10.5194/essd-16-2351-2024","DOIUrl":"https://doi.org/10.5194/essd-16-2351-2024","url":null,"abstract":"Abstract. Nature-based solutions have appeared as relevant solutions to mitigate urban heat islands. To improve our knowledge of the assessment of this ecosystem service and the related physical processes (evapotranspiration), monitoring campaigns are required. This was the objective of several experiments carried out on the Blue Green Wave, a large green roof located in Champs-sur-Marne (France). Three different protocols were implemented and tested to assess the evapotranspiration flux at different scales: the first one was based on the surface energy balance (large scale); the second one was carried out using an evapotranspiration chamber (small scale); and the third one was based on the water balance evaluated during dry periods (point scale). In addition to these evapotranspiration estimates, several hydrometeorological variables (especially temperature) were measured. Related data and Python programs providing preliminary elements of the analysis and graphical representation have been made available. They illustrate the space–time variability in the studied processes regarding their observation scale. The dataset is available at https://doi.org/10.5281/zenodo.8064053 (Versini et al., 2023).","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"55 18 1","pages":""},"PeriodicalIF":11.4,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140942685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A 10 km daily-level ultraviolet radiation predicting dataset based on machine learning models in China from 2005 to 2020 基于机器学习模型的 2005-2020 年中国 10 公里日尺度紫外线辐射预测数据集

IF 11.4 1区地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY

Earth System Science Data

Pub Date : 2024-05-15 DOI: 10.5194/essd-2024-111

Yichen Jiang, Su Shi, Xinyue Li, Chang Xu, Haidong Kan, Bo Hu, Xia Meng

Abstract. Ultraviolet (UV) radiation is closely related to health, but limited measurements hindered further investigation of its health effects in China. Machine learning algorithm has been widely used in predicting environmental factors with high accuracy, but limited studies have done for UV radiation. This study aimed to develop UV radiation prediction model based on random forest method, and predict UV radiation at daily level and 10 km resolution in mainland China in 2005–2020. A random forest model was employed to predict UV radiation by integrating ground UV radiation measurements from monitoring stations and multiple predictors, such as UV radiation data from satellite. Missing data of satellite-based UV radiation was filled by three-day moving average method. The model's performance was evaluated through multiple cross-validation (CV) methods. The overall R² (root mean square error, RMSE) between measured and predicted UV radiation from model development and model 10-fold CV was 0.97 (15.64 W m^-2) and 0.83 (37.44 W m^-2) at daily level, respectively. The model with OMI EDD performed higher predicting accuracy than the one without it. Based on predictions of UV radiation at daily level and 10 km spatial resolution and nearly 100 % spatiotemporal coverage, we found UV radiation increased by 4.20 % while PM_2.5 levels decreased by 48.51 % and O₃ levels rose by 22.70 % in 2013–2020, suggesting a potential correlation among these environmental factors. Uneven spatial distribution of UV radiation was found to be associated with factors such as latitude, elevation, meteorological factors and seasons. The eastern areas of China posed higher risk with both high population density and UV radiation intensity. Based on machine learning algorithm, this study generated a gridded dataset characterized by relatively high precision and extensive spatiotemporal coverage of UV radiation, which demonstrates the spatiotemporal variability of UV radiation levels in China and can facilitate health-related research in the future. This dataset is currently freely available at https://doi.org/10.5281/zenodo.10884591 (Jiang et al., 2024).

摘要。紫外线（UV）辐射与健康密切相关，但在中国，有限的测量数据阻碍了对其健康影响的进一步研究。机器学习算法已被广泛应用于环境因素的高精度预测，但针对紫外线辐射的研究还很有限。本研究旨在开发基于随机森林方法的紫外线辐射预测模型，并预测 2005-2020 年中国大陆日水平和 10 km 分辨率的紫外线辐射。研究采用随机森林模型，综合监测站的地面紫外辐射测量数据和卫星紫外辐射数据等多个预测因子，对紫外辐射进行预测。卫星紫外辐射缺失数据采用三天移动平均法进行填补。模型的性能通过多种交叉验证（CV）方法进行评估。模型开发和模型 10 倍交叉验证得出的紫外辐射测量值与预测值之间的总 R2（均方根误差，RMSE）分别为 0.97（15.64 W m-2）和 0.83（37.44 W m-2）。采用 OMI EDD 的模型比不采用 OMI EDD 的模型预测精度更高。基于日紫外线辐射预测和 10 千米空间分辨率以及近 100%的时空覆盖率，我们发现 2013-2020 年紫外线辐射增加了 4.20%，而 PM2.5 水平下降了 48.51%，O3 水平上升了 22.70%，这表明这些环境因素之间存在潜在的相关性。研究发现，紫外线辐射的不均匀空间分布与纬度、海拔、气象因素和季节等因素有关。中国东部地区人口密度高，紫外线辐射强度大，因此风险较高。基于机器学习算法，本研究生成了一个网格数据集，该数据集具有精度相对较高、紫外线辐射时空覆盖面广的特点，展示了中国紫外线辐射水平的时空变异性，有助于未来开展与健康相关的研究。该数据集目前可在 https://doi.org/10.5281/zenodo.10884591 免费获取（Jiang 等，2024 年）。

{"title":"A 10 km daily-level ultraviolet radiation predicting dataset based on machine learning models in China from 2005 to 2020","authors":"Yichen Jiang, Su Shi, Xinyue Li, Chang Xu, Haidong Kan, Bo Hu, Xia Meng","doi":"10.5194/essd-2024-111","DOIUrl":"https://doi.org/10.5194/essd-2024-111","url":null,"abstract":"Abstract. Ultraviolet (UV) radiation is closely related to health, but limited measurements hindered further investigation of its health effects in China. Machine learning algorithm has been widely used in predicting environmental factors with high accuracy, but limited studies have done for UV radiation. This study aimed to develop UV radiation prediction model based on random forest method, and predict UV radiation at daily level and 10 km resolution in mainland China in 2005–2020. A random forest model was employed to predict UV radiation by integrating ground UV radiation measurements from monitoring stations and multiple predictors, such as UV radiation data from satellite. Missing data of satellite-based UV radiation was filled by three-day moving average method. The model's performance was evaluated through multiple cross-validation (CV) methods. The overall R2 (root mean square error, RMSE) between measured and predicted UV radiation from model development and model 10-fold CV was 0.97 (15.64 W m-2) and 0.83 (37.44 W m-2) at daily level, respectively. The model with OMI EDD performed higher predicting accuracy than the one without it. Based on predictions of UV radiation at daily level and 10 km spatial resolution and nearly 100 % spatiotemporal coverage, we found UV radiation increased by 4.20 % while PM2.5 levels decreased by 48.51 % and O3 levels rose by 22.70 % in 2013–2020, suggesting a potential correlation among these environmental factors. Uneven spatial distribution of UV radiation was found to be associated with factors such as latitude, elevation, meteorological factors and seasons. The eastern areas of China posed higher risk with both high population density and UV radiation intensity. Based on machine learning algorithm, this study generated a gridded dataset characterized by relatively high precision and extensive spatiotemporal coverage of UV radiation, which demonstrates the spatiotemporal variability of UV radiation levels in China and can facilitate health-related research in the future. This dataset is currently freely available at https://doi.org/10.5281/zenodo.10884591 (Jiang et al., 2024).","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"32 1","pages":""},"PeriodicalIF":11.4,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140942713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Global mapping of oil palm planting year from 1990 to 2021 1990 至 2021 年全球油棕榈树种植年分布图

IF 11.4 1区地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY

Earth System Science Data

Pub Date : 2024-05-15 DOI: 10.5194/essd-2024-157

Adrià Descals, David L. A. Gaveau, Serge Wich, Zoltan Szantoi, Erik Meijaard

Abstract. Oil palm is a controversial crop, primarily because it is associated with negative environmental impacts such as tropical deforestation. Mapping the crop and its characteristics, such as age, is crucial for informing public and policy discussions regarding these impacts. Oil palm has received substantial mapping efforts, but up-to-date accurate oil palm maps for both extent and age are essential for monitoring impacts and informing concomitant debate. Here, we present a 10-meter resolution global map of industrial and smallholder oil palm, developed using Sentinel-1 data for the years 2016–2021 and a deep learning model based on convolutional neural networks. In addition, we used Landsat-5, -7, and -8 to estimate the planting year from 1990 to 2021 at a 30-meter spatial resolution. The planting year indicates the year of establishment for an oil palm plantation as of 2021, either newly planted or replanted oil palm in an existing plantation. We validated the oil palm extent layer using 17,812 randomly distributed reference points. The accuracy of the planting year layer was assessed using field data collected from 5,831 industrial parcels and 1,012 smallholder plantations distributed throughout the oil palm growing area. We found oil palm plantations covering a total mapped area of 23.98 Mha, and our area estimates are 16.66 ± 0.25 Mha of industrial and 7.59 ± 0.29 Mha of smallholder oil palm worldwide. The producers’ and users’ accuracy is 91.9 ± 3.4 % and 91.8 ± 1.0 % for industrial plantations, and 72.7 ± 1.3 % and 75.7 ± 2.5 % for smallholders, which improves upon a previous global oil palm dataset, particularly in terms of omission of oil palm. The overall mean error between estimated planting year and field data was -0.24 years and the root-mean-square error was 2.65 years, but the agreement was lower for smallholders. Mapping the extent and planting year of smallholder plantations remains challenging, particularly for wild and sparsely planted oil palm, and future mapping efforts should focus on these specific types of plantations. The average oil palm plantation age was 14.1 years, and the area of oil palm over 20 years was 6.28 Mha. Given that oil palm plantations are typically replanted after 25 years, our findings indicate that this area will require replanting within the coming decade, starting from 2021. Our dataset provides valuable input for optimal land use planning to meet the growing global demand for vegetable oils. The global oil palm extent layer for the year 2021 and the planting year layer from 1990 to 2021 can be found at https://doi.org/10.5281/zenodo.11034131 (Descals, 2024).

摘要油棕是一种有争议的作物，主要是因为它与热带森林砍伐等负面环境影响有关。绘制油棕作物及其特征（如树龄）地图对于为公众和政策讨论提供有关这些影响的信息至关重要。油棕已经得到了大量的测绘工作，但最新的准确油棕范围和树龄地图对于监测影响和为相关辩论提供信息至关重要。在此，我们利用 2016-2021 年的哨兵-1 号数据和基于卷积神经网络的深度学习模型，绘制了一幅 10 米分辨率的全球工业和小农油棕地图。此外，我们还利用 Landsat-5、-7 和-8，以 30 米的空间分辨率估算了 1990 年至 2021 年的种植年份。种植年份表示截至 2021 年油棕种植园的建立年份，可以是新种植的油棕，也可以是现有种植园中重新种植的油棕。我们使用 17812 个随机分布的参考点验证了油棕范围层。我们使用从整个油棕种植区的 5831 块工业地块和 1012 个小农种植园收集到的实地数据评估了种植年份层的准确性。我们发现油棕种植园的绘图总面积为 23.98 公顷，我们估计全球工业油棕面积为 16.66 ± 0.25 公顷，小农油棕面积为 7.59 ± 0.29 公顷。工业种植园的生产者和用户准确率分别为 91.9 ± 3.4 % 和 91.8 ± 1.0 %，小农种植园的生产者和用户准确率分别为 72.7 ± 1.3 % 和 75.7 ± 2.5 %，这比以前的全球油棕数据集有所改进，尤其是在遗漏油棕方面。估计种植年份与实地数据之间的总体平均误差为-0.24年，均方根误差为2.65年，但小农户的一致性较低。绘制小农种植园的范围和种植年份图仍然具有挑战性，尤其是野生和稀疏种植的油棕榈树，未来的绘图工作应侧重于这些特定类型的种植园。油棕种植园的平均树龄为 14.1 年，20 年以上的油棕面积为 628 万公顷。鉴于油棕种植园通常会在 25 年后重新种植，我们的研究结果表明，从 2021 年开始，这一地区将在未来十年内需要重新种植。我们的数据集为优化土地利用规划提供了宝贵的信息，以满足全球日益增长的植物油需求。2021 年的全球油棕面积层和 1990 年至 2021 年的种植年份层可在 https://doi.org/10.5281/zenodo.11034131（Descals，2024 年）上找到。

{"title":"Global mapping of oil palm planting year from 1990 to 2021","authors":"Adrià Descals, David L. A. Gaveau, Serge Wich, Zoltan Szantoi, Erik Meijaard","doi":"10.5194/essd-2024-157","DOIUrl":"https://doi.org/10.5194/essd-2024-157","url":null,"abstract":"Abstract. Oil palm is a controversial crop, primarily because it is associated with negative environmental impacts such as tropical deforestation. Mapping the crop and its characteristics, such as age, is crucial for informing public and policy discussions regarding these impacts. Oil palm has received substantial mapping efforts, but up-to-date accurate oil palm maps for both extent and age are essential for monitoring impacts and informing concomitant debate. Here, we present a 10-meter resolution global map of industrial and smallholder oil palm, developed using Sentinel-1 data for the years 2016–2021 and a deep learning model based on convolutional neural networks. In addition, we used Landsat-5, -7, and -8 to estimate the planting year from 1990 to 2021 at a 30-meter spatial resolution. The planting year indicates the year of establishment for an oil palm plantation as of 2021, either newly planted or replanted oil palm in an existing plantation. We validated the oil palm extent layer using 17,812 randomly distributed reference points. The accuracy of the planting year layer was assessed using field data collected from 5,831 industrial parcels and 1,012 smallholder plantations distributed throughout the oil palm growing area. We found oil palm plantations covering a total mapped area of 23.98 Mha, and our area estimates are 16.66 ± 0.25 Mha of industrial and 7.59 ± 0.29 Mha of smallholder oil palm worldwide. The producers’ and users’ accuracy is 91.9 ± 3.4 % and 91.8 ± 1.0 % for industrial plantations, and 72.7 ± 1.3 % and 75.7 ± 2.5 % for smallholders, which improves upon a previous global oil palm dataset, particularly in terms of omission of oil palm. The overall mean error between estimated planting year and field data was -0.24 years and the root-mean-square error was 2.65 years, but the agreement was lower for smallholders. Mapping the extent and planting year of smallholder plantations remains challenging, particularly for wild and sparsely planted oil palm, and future mapping efforts should focus on these specific types of plantations. The average oil palm plantation age was 14.1 years, and the area of oil palm over 20 years was 6.28 Mha. Given that oil palm plantations are typically replanted after 25 years, our findings indicate that this area will require replanting within the coming decade, starting from 2021. Our dataset provides valuable input for optimal land use planning to meet the growing global demand for vegetable oils. The global oil palm extent layer for the year 2021 and the planting year layer from 1990 to 2021 can be found at https://doi.org/10.5281/zenodo.11034131 (Descals, 2024).","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"20 1","pages":""},"PeriodicalIF":11.4,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140942736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A global monthly field of seawater pH over 3 decades: a machine learning approach 三十年来海水 pH 值的全球月度领域：一种机器学习方法

IF 11.4 1区地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY

Earth System Science Data

Pub Date : 2024-05-15 DOI: 10.5194/essd-2024-151

Guorong Zhong, Xuegang Li, Jinming Song, Baoxiao Qu, Fan Wang, Yanjun Wang, Bin Zhang, Lijing Cheng, Jun Ma, Huamao Yuan, Liqin Duan, Ning Li, Qidong Wang, Jianwei Xing, Jiajia Dai

Abstract. The continuous uptake of anthropogenic CO₂ by the ocean leads to ocean acidification, which is an ongoing threat to the marine ecosystem. The ocean acidification rate was globally documented in the surface ocean but limited below the surface. Here, we present a monthly four-dimensional 1°×1° gridded product of global seawater pH, derived from a machine learning algorithm trained on pH observations at total scale and in-situ temperature from the Global Ocean Data Analysis Project (GLODAP). The constructed pH product covers the years 1992–2020 and depths from the surface to 2 km on 41 levels. Three types of machine learning algorithms were used in the pH product construction, including self-organizing map neural networks for region dividing, a stepwise algorithm for predictor selection, and feed-forward neural networks (FFNN) for non-linear relationship regression. The performance of the machine learning algorithm was validated using real observations by a cross validation method, where four repeating iterations were carried out with 25 % varied observations for each evaluation and 75 % for training. The constructed pH product is evaluated through comparisons to time series observations and the GLODAP pH climatology. The overall root mean square error between the FFNN constructed pH and the GLODAP measurements is 0.028, ranging from 0.044 in the surface to 0.013 at 2000 m. The pH product is distributed through the data repository of the Marine Science Data Center of the Chinese Academy of Sciences at http://dx.doi.org/10.12157/IOCAS.20230720.001 (Zhong et al., 2023).

摘要海洋不断吸收人为二氧化碳导致海洋酸化，对海洋生态系统构成持续威胁。海洋酸化率在全球表层海洋都有记录，但在表层以下却很有限。在这里，我们展示了全球海水pH值的月度四维1°×1°网格产品，该产品是根据全球海洋数据分析项目（GLODAP）的总尺度pH值观测数据和原位温度训练的机器学习算法得出的。构建的 pH 值产品涵盖 1992-2020 年，深度从海面到 2 公里，共 41 层。在构建 pH 产品时使用了三种机器学习算法，包括用于区域划分的自组织图神经网络、用于预测因子选择的逐步算法和用于非线性关系回归的前馈神经网络（FFNN）。机器学习算法的性能通过交叉验证法使用真实观测数据进行了验证，即进行四次重复迭代，每次评估使用 25% 的不同观测数据，训练使用 75% 的不同观测数据。通过与时间序列观测数据和 GLODAP pH 气候学数据的比较，对构建的 pH 产品进行了评估。FFNN 构建的 pH 值与 GLODAP 测量值之间的总体均方根误差为 0.028，从地表的 0.044 到 2000 米处的 0.013 不等。该 pH 值产品通过中国科学院海洋科学数据中心的数据存储库发布，网址为 http://dx.doi.org/10.12157/IOCAS.20230720.001（Zhong 等，2023 年）。

{"title":"A global monthly field of seawater pH over 3 decades: a machine learning approach","authors":"Guorong Zhong, Xuegang Li, Jinming Song, Baoxiao Qu, Fan Wang, Yanjun Wang, Bin Zhang, Lijing Cheng, Jun Ma, Huamao Yuan, Liqin Duan, Ning Li, Qidong Wang, Jianwei Xing, Jiajia Dai","doi":"10.5194/essd-2024-151","DOIUrl":"https://doi.org/10.5194/essd-2024-151","url":null,"abstract":"Abstract. The continuous uptake of anthropogenic CO2 by the ocean leads to ocean acidification, which is an ongoing threat to the marine ecosystem. The ocean acidification rate was globally documented in the surface ocean but limited below the surface. Here, we present a monthly four-dimensional 1°×1° gridded product of global seawater pH, derived from a machine learning algorithm trained on pH observations at total scale and in-situ temperature from the Global Ocean Data Analysis Project (GLODAP). The constructed pH product covers the years 1992–2020 and depths from the surface to 2 km on 41 levels. Three types of machine learning algorithms were used in the pH product construction, including self-organizing map neural networks for region dividing, a stepwise algorithm for predictor selection, and feed-forward neural networks (FFNN) for non-linear relationship regression. The performance of the machine learning algorithm was validated using real observations by a cross validation method, where four repeating iterations were carried out with 25 % varied observations for each evaluation and 75 % for training. The constructed pH product is evaluated through comparisons to time series observations and the GLODAP pH climatology. The overall root mean square error between the FFNN constructed pH and the GLODAP measurements is 0.028, ranging from 0.044 in the surface to 0.013 at 2000 m. The pH product is distributed through the data repository of the Marine Science Data Center of the Chinese Academy of Sciences at http://dx.doi.org/10.12157/IOCAS.20230720.001 (Zhong et al., 2023).","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"33 1","pages":""},"PeriodicalIF":11.4,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140942712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Retrieving Ground-Level PM2.5 Concentrations in China (2013–2021) with a Numerical Model-Informed Testbed to Mitigate Sample Imbalance-Induced Biases 利用数值模型信息试验台检索中国地面 PM2.5 浓度（2013-2021 年）以减轻样本失衡引起的偏差

IF 11.4 1区地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY

Earth System Science Data

Pub Date : 2024-05-15 DOI: 10.5194/essd-2024-170

Siwei Li, Yu Ding, Jia Xing, Joshua S. Fu

Abstract. Ground-level PM_2.5 data derived from satellites with machine learning are crucial for health and climate assessments, however, uncertainties persist due to the absence of spatially covered observations. To address this, we propose a novel testbed using untraditional numerical simulations to evaluate PM_2.5 estimation across the entire spatial domain. The testbed emulates the general machine-learning approach, by training the model with grids corresponding to ground monitor sites and subsequently testing its predictive accuracy for other locations. Our approach enables comprehensive evaluation of various machine-learning methods’ performance in estimating PM_2.5 across the spatial domain for the first time. Unexpected results are shown in the application in China, with larger PM_2.5biases found in densely populated regions with abundant ground observations across all benchmark models, challenging conventional expectations and are not explored in the recent literature. The imbalance in training samples, mostly from urban areas with high emissions, is the main reason, leading to significant overestimation due to the lack of monitors in downwind areas where PM_2.5is transported from urban areas with varying vertical profiles. Our proposed testbed also provides an efficient strategy for optimizing model structure or training samples to enhance satellite-retrieval model performance. Integration of spatiotemporal features, especially with CNN-based deep-learning approaches like the ResNet model, successfully mitigates PM_2.5overestimation (by 5–30 µg m^-3) and corresponding exposure (by 3 million people • µg m^-3) in the downwind area over the past nine years (2013–2021) compared to the traditional approach. Furthermore, the incorporation of 600 strategically positioned ground-measurement sites identified through the testbed is essential to achieve a more balanced distribution of training samples, thereby ensuring precise PM_2.5 estimation and facilitating the assessment of associated impacts in China. In addition to presenting the retrieved surface PM_2.5concentrations in China from 2013 to 2021, this study provides a testbed dataset derived from physical modeling simulations which can serve to evaluate the performance of data-driven methodologies, such as machine learning, in estimating spatial PM_2.5 concentrations for the community.

摘要利用机器学习从卫星获取的地面 PM2.5 数据对于健康和气候评估至关重要，然而，由于缺乏空间覆盖的观测数据，不确定性依然存在。为了解决这个问题，我们提出了一个新颖的测试平台，利用非传统的数值模拟来评估整个空间域的 PM2.5 估算。该试验平台模仿了一般的机器学习方法，通过与地面监测点相对应的网格来训练模型，随后测试其对其他地点的预测准确性。我们的方法首次实现了对各种机器学习方法在估计整个空间域的 PM2.5 性能方面的全面评估。在中国的应用中出现了意想不到的结果，在人口稠密地区，所有基准模型都存在较大的 PM2.5 偏差，而这些基准模型都有丰富的地面观测数据。训练样本的不平衡是主要原因，这些样本大多来自高排放的城市地区，由于下风向地区缺乏监测仪，PM2.5从城市地区以不同的垂直剖面传输，导致了显著的高估。我们提出的测试平台还提供了优化模型结构或训练样本的有效策略，以提高卫星检索模型的性能。与传统方法相比，时空特征的整合，尤其是与基于 CNN 的深度学习方法（如 ResNet 模型）的整合，成功缓解了过去九年（2013-2021 年）中下风向地区 PM2.5 的高估（5-30 µg m-3）和相应的暴露量（300 万人 - µg m-3）。此外，通过试验平台确定的 600 个战略位置地面测量点的加入对于实现更均衡的训练样本分布至关重要，从而确保精确估算 PM2.5，并促进对中国相关影响的评估。除了展示 2013 年至 2021 年中国地表 PM2.5 浓度的检索结果外，本研究还提供了一个来自物理建模模拟的试验台数据集，可用于评估数据驱动方法（如机器学习）在估算社区空间 PM2.5 浓度方面的性能。

{"title":"Retrieving Ground-Level PM2.5 Concentrations in China (2013–2021) with a Numerical Model-Informed Testbed to Mitigate Sample Imbalance-Induced Biases","authors":"Siwei Li, Yu Ding, Jia Xing, Joshua S. Fu","doi":"10.5194/essd-2024-170","DOIUrl":"https://doi.org/10.5194/essd-2024-170","url":null,"abstract":"Abstract. Ground-level PM2.5 data derived from satellites with machine learning are crucial for health and climate assessments, however, uncertainties persist due to the absence of spatially covered observations. To address this, we propose a novel testbed using untraditional numerical simulations to evaluate PM2.5 estimation across the entire spatial domain. The testbed emulates the general machine-learning approach, by training the model with grids corresponding to ground monitor sites and subsequently testing its predictive accuracy for other locations. Our approach enables comprehensive evaluation of various machine-learning methods’ performance in estimating PM2.5 across the spatial domain for the first time. Unexpected results are shown in the application in China, with larger PM2.5 biases found in densely populated regions with abundant ground observations across all benchmark models, challenging conventional expectations and are not explored in the recent literature. The imbalance in training samples, mostly from urban areas with high emissions, is the main reason, leading to significant overestimation due to the lack of monitors in downwind areas where PM2.5 is transported from urban areas with varying vertical profiles. Our proposed testbed also provides an efficient strategy for optimizing model structure or training samples to enhance satellite-retrieval model performance. Integration of spatiotemporal features, especially with CNN-based deep-learning approaches like the ResNet model, successfully mitigates PM2.5 overestimation (by 5–30 µg m-3) and corresponding exposure (by 3 million people • µg m-3) in the downwind area over the past nine years (2013–2021) compared to the traditional approach. Furthermore, the incorporation of 600 strategically positioned ground-measurement sites identified through the testbed is essential to achieve a more balanced distribution of training samples, thereby ensuring precise PM2.5 estimation and facilitating the assessment of associated impacts in China. In addition to presenting the retrieved surface PM2.5 concentrations in China from 2013 to 2021, this study provides a testbed dataset derived from physical modeling simulations which can serve to evaluate the performance of data-driven methodologies, such as machine learning, in estimating spatial PM2.5 concentrations for the community.","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"20 1","pages":""},"PeriodicalIF":11.4,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140942723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Water vapor Raman-lidar observations from multiple sites in the framework of WaLiNeAs 在 WaLiNeAs 框架内从多个地点进行水蒸气拉曼激光雷达观测

IF 11.4 1区地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY

Earth System Science Data

Pub Date : 2024-05-15 DOI: 10.5194/essd-2024-73

Frédéric Laly, Patrick Chazette, Julien Totems, Jérémy Lagarrigue, Laurent Forges, Cyrille Flamant

Abstract. During the Water Vapor Lidar Network Assimilation (WaLiNeAs) campaign, 8 lidars specifically designed to measure water vapor mixing ratio (WVMR) profiles were deployed on the western Mediterranean coast. The main objectives were to investigate the water vapor content during case studies of heavy precipitation events in the coastal Western Mediterranean and assess the impact of high spatio-temporal WVMR data on numerical weather prediction forecasts by means of state–of–the–art assimilation techniques. Given the increasing occurrence of extreme events due to climate change, WaLiNeAs is the first program in Europe to provide network–like, simultaneous and continuous water vapor profile measurements. This paper focuses on the WVMR profiling datasets obtained from three of the lidars managed by the French component of the WaLiNeAs team. These lidars were deployed in the towns of Coursan, Grau du Roi and Cannes. This measurement setup enabled monitoring of the water vapor content within the low troposphere along a period of three months over autumn – winter 2022 and four months in summer 2023. The lidars measured the WVMR profiles from the surface up to approximately 6–10 km at night, and 1–2 km during daytime; with a vertical resolution of 100 m and a time sampling between 15 – 30 min, selected to meet the needs of weather forecasting with an uncertainty lower than 0.4 g kg^-1. The paper presents details about the instruments, the experimental strategy, as well as the datasets given in NETcdf format. The final dataset is divided in two datasets, the first with a time resolution of 15 min, which contains a total of 26 423 WVMR vertical profiles and the second with a time resolution of 30 min to improve the signal to noise ratio and signal altitude range.

摘要在水汽激光雷达网络同化（WaLiNeAs）活动期间，在地中海西海岸部署了 8 台专门用于测量水汽混合比（WVMR）剖面的激光雷达。主要目的是调查地中海西部沿海强降水事件案例研究期间的水汽含量，并通过最先进的同化技术评估高时空水汽混合比数据对数值天气预报预测的影响。鉴于气候变化导致极端事件日益增多，WaLiNeAs 是欧洲第一个提供网络式、同步和连续水汽剖面测量的计划。本文重点介绍由 WaLiNeAs 团队法国分部管理的三台激光雷达获得的水汽廓线数据集。这些激光雷达分别部署在库桑、格拉杜罗伊和戛纳镇。这种测量设置能够在 2022 年秋冬季的三个月和 2023 年夏季的四个月内监测低对流层中的水汽含量。激光雷达在夜间测量地表至约 6-10 千米的水汽含量剖面，在白天测量地表至约 1-2 千米的水汽含量剖面；垂直分辨率为 100 米，采样时间为 15-30 分钟，以满足天气预报的需要，不确定性低于 0.4 克千克-1。论文详细介绍了仪器、实验策略以及以 NETcdf 格式提供的数据集。最终数据集分为两个数据集，第一个数据集的时间分辨率为 15 分钟，共包含 26 423 个 WVMR 垂直剖面图；第二个数据集的时间分辨率为 30 分钟，以提高信噪比和信号高度范围。

{"title":"Water vapor Raman-lidar observations from multiple sites in the framework of WaLiNeAs","authors":"Frédéric Laly, Patrick Chazette, Julien Totems, Jérémy Lagarrigue, Laurent Forges, Cyrille Flamant","doi":"10.5194/essd-2024-73","DOIUrl":"https://doi.org/10.5194/essd-2024-73","url":null,"abstract":"Abstract. During the Water Vapor Lidar Network Assimilation (WaLiNeAs) campaign, 8 lidars specifically designed to measure water vapor mixing ratio (WVMR) profiles were deployed on the western Mediterranean coast. The main objectives were to investigate the water vapor content during case studies of heavy precipitation events in the coastal Western Mediterranean and assess the impact of high spatio-temporal WVMR data on numerical weather prediction forecasts by means of state–of–the–art assimilation techniques. Given the increasing occurrence of extreme events due to climate change, WaLiNeAs is the first program in Europe to provide network–like, simultaneous and continuous water vapor profile measurements. This paper focuses on the WVMR profiling datasets obtained from three of the lidars managed by the French component of the WaLiNeAs team. These lidars were deployed in the towns of Coursan, Grau du Roi and Cannes. This measurement setup enabled monitoring of the water vapor content within the low troposphere along a period of three months over autumn – winter 2022 and four months in summer 2023. The lidars measured the WVMR profiles from the surface up to approximately 6–10 km at night, and 1–2 km during daytime; with a vertical resolution of 100 m and a time sampling between 15 – 30 min, selected to meet the needs of weather forecasting with an uncertainty lower than 0.4 g kg-1. The paper presents details about the instruments, the experimental strategy, as well as the datasets given in NETcdf format. The final dataset is divided in two datasets, the first with a time resolution of 15 min, which contains a total of 26 423 WVMR vertical profiles and the second with a time resolution of 30 min to improve the signal to noise ratio and signal altitude range.","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"33 1","pages":""},"PeriodicalIF":11.4,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140942694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0