Earth System Science Data最新文献_第2页

High-spatiotemporal reconstruction of biogeochemical dynamics in Australia integrating satellites products and in-situ observations (2000–2022) 综合卫星产品和现场观测结果的澳大利亚生物地球化学动态高时空重建（2000-2022 年）

IF 11.4 1区地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY

Earth System Science Data

Pub Date : 2024-07-02 DOI: 10.5194/essd-2024-219

Xiaohan Zhang, Lizhe Wang, Jining Yan, Sheng Wang

Abstract. The marine biogeochemical time-series products, which include total alkalinity, inorganic carbon, nitrate, phosphate, silicate, and pH, constitute a foundational support mechanism for the ongoing surveillance of oceanic biogeochemical changes. These products play a critical role in facilitating research focused on dynamic monitoring of marine ecosystems and fostering sustainable oceanic development. However, existing monitoring methodologies are hampered by inherent limitations, notably the paucity of observational products that simultaneously offer high spatial and temporal resolutions. Furthermore, the interpolation methods typically employed in these contexts frequently prove low-effective on a large scale, resulting in data with extensive temporal and spatial expanses that are difficulty for applications aimed at monitoring large-scale ocean dynamics. A novel integration of the CANYON-B and Random Forest regression methods was explored to address these challenges in reconstructing key marine biogeochemical parameters. This work reconstructs the concentrations of these marine biogeochemicals at the sea surface within Australia's Exclusive Economic Zone over the period from 2000 to 2022 on a 1-kilometre scale. The approach involves the amalgamation of multi-source in-situ ocean chemistry time-series observations with MODIS Terra ocean reflectance imagery and ocean water colour product distributions. This research highlights the substantial capabilities of machine learning for the large-scale reconstruction of ocean chemistry data, introducing a new, viable method for utilising in-situ measurements and optical imagery in reconstructing marine biogeochemical elements, thereby significantly enhancing our ability to monitor large-scale ocean dynamics. The datasets generated and analysed in this study are available on Science Data Bank (https://doi.org/10.57760/sciencedb.09331) (Zhang et al., 2024)

摘要。海洋生物地球化学时间序列产品包括总碱度、无机碳、硝酸盐、磷酸盐、硅酸盐和 pH 值，是持续监测海洋生物地球化学变化的基础支持机制。这些产品在促进以海洋生态系统动态监测为重点的研究和促进海洋可持续发展方面发挥着至关重要的作用。然而，现有的监测方法受到固有限制的阻碍，特别是同时提供高空间和时间分辨率的观测产品很少。此外，在这些情况下通常采用的插值方法经常被证明在大尺度范围内效果不佳，导致数据的时空跨度过大，难以应用于大尺度海洋动态监测。为了应对这些挑战，我们探索了一种新颖的 CANYON-B 和随机森林回归方法，以重建关键的海洋生物地球化学参数。这项工作重建了 2000 年至 2022 年期间澳大利亚专属经济区海面上这些海洋生物地球化学物质在 1 公里范围内的浓度。该方法包括将多源原位海洋化学时间序列观测数据与 MODIS Terra 海洋反射率图像和海洋水色产品分布相结合。这项研究凸显了机器学习在大规模重建海洋化学数据方面的巨大能力，为利用原位测量和光学图像重建海洋生物地球化学要素引入了一种新的可行方法，从而大大提高了我们监测大尺度海洋动态的能力。本研究生成和分析的数据集可在科学数据库（https://doi.org/10.57760/sciencedb.09331）上查阅（Zhang et al.）

{"title":"High-spatiotemporal reconstruction of biogeochemical dynamics in Australia integrating satellites products and in-situ observations (2000–2022)","authors":"Xiaohan Zhang, Lizhe Wang, Jining Yan, Sheng Wang","doi":"10.5194/essd-2024-219","DOIUrl":"https://doi.org/10.5194/essd-2024-219","url":null,"abstract":"Abstract. The marine biogeochemical time-series products, which include total alkalinity, inorganic carbon, nitrate, phosphate, silicate, and pH, constitute a foundational support mechanism for the ongoing surveillance of oceanic biogeochemical changes. These products play a critical role in facilitating research focused on dynamic monitoring of marine ecosystems and fostering sustainable oceanic development. However, existing monitoring methodologies are hampered by inherent limitations, notably the paucity of observational products that simultaneously offer high spatial and temporal resolutions. Furthermore, the interpolation methods typically employed in these contexts frequently prove low-effective on a large scale, resulting in data with extensive temporal and spatial expanses that are difficulty for applications aimed at monitoring large-scale ocean dynamics. A novel integration of the CANYON-B and Random Forest regression methods was explored to address these challenges in reconstructing key marine biogeochemical parameters. This work reconstructs the concentrations of these marine biogeochemicals at the sea surface within Australia's Exclusive Economic Zone over the period from 2000 to 2022 on a 1-kilometre scale. The approach involves the amalgamation of multi-source in-situ ocean chemistry time-series observations with MODIS Terra ocean reflectance imagery and ocean water colour product distributions. This research highlights the substantial capabilities of machine learning for the large-scale reconstruction of ocean chemistry data, introducing a new, viable method for utilising in-situ measurements and optical imagery in reconstructing marine biogeochemical elements, thereby significantly enhancing our ability to monitor large-scale ocean dynamics. The datasets generated and analysed in this study are available on Science Data Bank (https://doi.org/10.57760/sciencedb.09331) (Zhang et al., 2024)","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"31 1","pages":""},"PeriodicalIF":11.4,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141489476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Permafrost temperature baseline at 15 meters depth in the Qinghai-Tibet Plateau (2010–2019) 青藏高原 15 米深处的冻土温度基线（2010-2019 年）

IF 11.4 1区地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY

Earth System Science Data

Pub Date : 2024-07-01 DOI: 10.5194/essd-2024-114

Defu Zou, Lin Zhao, Guojie Hu, Erji Du, Guangyue Liu, Chong Wang, Wangping Li

Abstract. The ground temperature at a fixed depth is a crucial boundary condition for understanding the properties of deep permafrost. However, the commonly used mean annual ground temperature at the depth of the zero annual amplitude (MAGT_dzaa) has application limitations due to large spatial heterogeneity in observed depths. In this study, we utilized 231 borehole records of mean annual ground temperature at a depth of 15 meters (MAGT_15m) from 2010 to 2019 and employed support vector regression (SVR) to predict gridded MAGT_15m data at a spatial resolution of nearly 1 km across the Qinghai-Tibet Plateau (QTP). SVR predictions demonstrated a R² value of 0.48 with a negligible negative overestimation (-0.01 °C). The average MAGT_15m of the QTP permafrost was -1.85 °C (±1.58 °C), with 90% of values ranging from -5.1 °C to -0.1 °C and 51.2% exceeding -1.5 °C. The freezing degree days (FDD) was the most significant predictor (p<0.001) of MAGT_15m, followed by thawing degree days (TDD), mean annual precipitation (MAP), and soil bulk density (BD) (p<0.01). Overall, the MAGT_15m increased from northwest to southeast and decreased with elevation. Lower MAGT_15m values are prevail in high mountainous areas with steep slopes. The MAGT_15m was the lowest in the basins of the Amu Darya, Indus, and Tarim rivers (-2.7 to -2.9 °C) and the highest in the Yangtze and Yellow River basins (-0.8 to -0.9 °C). The baseline dataset of MAGT_15m during 2010–2019 for the QTP permafrost will facilitates simulations of deep permafrost characteristics and provides fundamental data for permafrost model validation and improvement.

摘要固定深度的地温是了解深层冻土特性的重要边界条件。然而，由于观测深度存在较大的空间异质性，常用的零年振幅深度年平均地温（MAGTdzaa）在应用上存在局限性。在本研究中，我们利用了 2010 年至 2019 年期间 231 个钻孔记录的 15 米深度年平均地温（MAGT15m），并采用支持向量回归（SVR）预测了青藏高原（QTP）近 1 千米空间分辨率的网格化 MAGT15m 数据。SVR 预测的 R2 值为 0.48，负高估（-0.01 °C）可忽略不计。青藏高原冻土层的平均 MAGT15m 为 -1.85 °C（±1.58 °C），90% 的数值在 -5.1 °C 至 -0.1 °C 之间，51.2% 的数值超过 -1.5 °C。冰冻度日 (FDD) 是预测 MAGT15m 的最显著因子（p<0.001），其次是解冻度日 (TDD)、年平均降水量 (MAP) 和土壤容重 (BD)（p<0.01）。总体而言，MAGT15m 值从西北向东南递增，并随着海拔的升高而降低。高山陡坡地区的 MAGT15m 值普遍较低。阿姆河、印度河和塔里木河流域的 MAGT15m 值最低（-2.7 至 -2.9°C），长江和黄河流域的 MAGT15m 值最高（-0.8 至 -0.9°C）。2010-2019 年期间青藏高原冻土 MAGT15m 基线数据集将有助于模拟深部冻土特征，并为冻土模型验证和改进提供基础数据。

{"title":"Permafrost temperature baseline at 15 meters depth in the Qinghai-Tibet Plateau (2010–2019)","authors":"Defu Zou, Lin Zhao, Guojie Hu, Erji Du, Guangyue Liu, Chong Wang, Wangping Li","doi":"10.5194/essd-2024-114","DOIUrl":"https://doi.org/10.5194/essd-2024-114","url":null,"abstract":"Abstract. The ground temperature at a fixed depth is a crucial boundary condition for understanding the properties of deep permafrost. However, the commonly used mean annual ground temperature at the depth of the zero annual amplitude (MAGTdzaa) has application limitations due to large spatial heterogeneity in observed depths. In this study, we utilized 231 borehole records of mean annual ground temperature at a depth of 15 meters (MAGT15m) from 2010 to 2019 and employed support vector regression (SVR) to predict gridded MAGT15m data at a spatial resolution of nearly 1 km across the Qinghai-Tibet Plateau (QTP). SVR predictions demonstrated a R2 value of 0.48 with a negligible negative overestimation (-0.01 °C). The average MAGT15m of the QTP permafrost was -1.85 °C (±1.58 °C), with 90% of values ranging from -5.1 °C to -0.1 °C and 51.2% exceeding -1.5 °C. The freezing degree days (FDD) was the most significant predictor (p<0.001) of MAGT15m, followed by thawing degree days (TDD), mean annual precipitation (MAP), and soil bulk density (BD) (p<0.01). Overall, the MAGT15m increased from northwest to southeast and decreased with elevation. Lower MAGT15m values are prevail in high mountainous areas with steep slopes. The MAGT15m was the lowest in the basins of the Amu Darya, Indus, and Tarim rivers (-2.7 to -2.9 °C) and the highest in the Yangtze and Yellow River basins (-0.8 to -0.9 °C). The baseline dataset of MAGT15m during 2010–2019 for the QTP permafrost will facilitates simulations of deep permafrost characteristics and provides fundamental data for permafrost model validation and improvement.","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"30 1","pages":""},"PeriodicalIF":11.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141475253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A global forest burn severity dataset from Landsat imagery (2003–2016) 从大地遥感卫星图像中提取的全球森林燃烧严重程度数据集（2003-2016 年）

IF 11.4 1区地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY

Earth System Science Data

Pub Date : 2024-07-01 DOI: 10.5194/essd-16-3061-2024

Kang He, Xinyi Shen, Emmanouil N. Anagnostou

Abstract. Forest fires, while destructive and dangerous, are important to the functioning and renewal of ecosystems. Over the past 2 decades, large-scale, severe forest fires have become more frequent globally, and the risk is expected to increase as fire weather and drought conditions intensify. To improve quantification of the intensity and extent of forest fire damage, we have developed a 30 m resolution global forest burn severity (GFBS) dataset of the degree of biomass consumed by fires from 2003 to 2016. To develop this dataset, we used the Global Fire Atlas product to determine when and where forest fires occurred during that period and then we overlaid the available Landsat surface reflectance products to obtain pre-fire and post-fire normalized burn ratios (NBRs) for each burned pixel, designating the difference between them as dNBR and the relative difference as RdNBR. We compared the GFBS dataset against the Canada Landsat Burned Severity (CanLaBS) product, showing better agreement than the existing Moderate Resolution Imaging Spectrometer (MODIS)-based global burn severity dataset (MOdis burn SEVerity, MOSEV) in representing the distribution of forest burn severity over Canada. Using the in situ burn severity category data available for the 2013 wildfires in southeastern Australia, we demonstrated that GFBS could provide burn severity estimation with clearer differentiation between the high-severity and moderate-/low-severity classes, while such differentiation among the in situ burn severity classes is not captured in the MOSEV product. Using the CONUS-wide composite burn index (CBI) as a ground truth, we showed that dNBR from GFBS was more strongly correlated with CBI (r=0.63) than dNBR from MOSEV (r=0.28). RdNBR from GFBS also exhibited better agreement with CBI (r=0.56) than RdNBR from MOSEV (r=0.20). On a global scale, while the dNBR and RdNBR spatial patterns extracted by GFBS are similar to those of MOSEV, MOSEV tends to provide higher burn severity levels than GFBS. We attribute this difference to variations in reflectance values and the different spatial resolutions of the two satellites. The GFBS dataset provides a more precise and reliable assessment of burn severity than existing available datasets. These enhancements are crucial for understanding the ecological impacts of forest fires and for informing management and recovery efforts in affected regions worldwide. The GFBS dataset is freely accessible at https://doi.org/10.5281/zenodo.10037629 (He et al., 2023).

摘要森林火灾虽然具有破坏性和危险性，但对生态系统的运作和更新非常重要。在过去的 20 年里，全球范围内大规模的严重森林火灾越来越频繁，而且随着火灾天气和干旱条件的加剧，预计森林火灾的风险还会增加。为了更好地量化森林火灾破坏的强度和范围，我们开发了一个 30 米分辨率的全球森林燃烧严重程度（GFBS）数据集，其中包含 2003 年至 2016 年火灾消耗的生物量程度。为了开发该数据集，我们使用了全球火灾图集产品来确定这一时期发生森林火灾的时间和地点，然后叠加现有的大地遥感卫星表面反射率产品，以获得每个被烧毁像素的火灾前和火灾后归一化烧毁率（NBR），并将两者之间的差值称为 dNBR，相对差值称为 RdNBR。我们将 GFBS 数据集与加拿大陆地卫星烧毁严重程度（CanLaBS）产品进行了比较，结果表明，在表示加拿大森林烧毁严重程度分布方面，GFBS 数据集比现有的基于中分辨率成像光谱仪（MODIS）的全球烧毁严重程度数据集（MODIS burn SEVerity，MOSEV）更一致。通过使用 2013 年澳大利亚东南部野火的原地燃烧严重程度类别数据，我们证明了全球森林燃烧严重程度数据集可以提供燃烧严重程度估算，并更清晰地区分严重程度等级和中/低严重程度等级，而 MOSEV 产品并未捕捉到原地燃烧严重程度等级之间的这种区分。使用全美烧伤综合指数（CBI）作为基本事实，我们发现，与 MOSEV 的 dNBR（r=0.28）相比，GFBS 的 dNBR 与 CBI 的相关性更强（r=0.63）。来自 GFBS 的 RdNBR 与 CBI 的一致性（r=0.56）也优于来自 MOSEV 的 RdNBR（r=0.20）。在全球范围内，虽然 GFBS 提取的 dNBR 和 RdNBR 空间模式与 MOSEV 相似，但 MOSEV 提供的烧伤严重程度往往高于 GFBS。我们将这种差异归因于反射率值的变化以及两颗卫星不同的空间分辨率。与现有数据集相比，GFBS 数据集能提供更精确、更可靠的燃烧严重程度评估。这些改进对于了解森林火灾的生态影响以及为全球受影响地区的管理和恢复工作提供信息至关重要。全球森林火灾数据集可在 https://doi.org/10.5281/zenodo.10037629 免费访问（He 等人，2023 年）。

{"title":"A global forest burn severity dataset from Landsat imagery (2003–2016)","authors":"Kang He, Xinyi Shen, Emmanouil N. Anagnostou","doi":"10.5194/essd-16-3061-2024","DOIUrl":"https://doi.org/10.5194/essd-16-3061-2024","url":null,"abstract":"Abstract. Forest fires, while destructive and dangerous, are important to the functioning and renewal of ecosystems. Over the past 2 decades, large-scale, severe forest fires have become more frequent globally, and the risk is expected to increase as fire weather and drought conditions intensify. To improve quantification of the intensity and extent of forest fire damage, we have developed a 30 m resolution global forest burn severity (GFBS) dataset of the degree of biomass consumed by fires from 2003 to 2016. To develop this dataset, we used the Global Fire Atlas product to determine when and where forest fires occurred during that period and then we overlaid the available Landsat surface reflectance products to obtain pre-fire and post-fire normalized burn ratios (NBRs) for each burned pixel, designating the difference between them as dNBR and the relative difference as RdNBR. We compared the GFBS dataset against the Canada Landsat Burned Severity (CanLaBS) product, showing better agreement than the existing Moderate Resolution Imaging Spectrometer (MODIS)-based global burn severity dataset (MOdis burn SEVerity, MOSEV) in representing the distribution of forest burn severity over Canada. Using the in situ burn severity category data available for the 2013 wildfires in southeastern Australia, we demonstrated that GFBS could provide burn severity estimation with clearer differentiation between the high-severity and moderate-/low-severity classes, while such differentiation among the in situ burn severity classes is not captured in the MOSEV product. Using the CONUS-wide composite burn index (CBI) as a ground truth, we showed that dNBR from GFBS was more strongly correlated with CBI (r=0.63) than dNBR from MOSEV (r=0.28). RdNBR from GFBS also exhibited better agreement with CBI (r=0.56) than RdNBR from MOSEV (r=0.20). On a global scale, while the dNBR and RdNBR spatial patterns extracted by GFBS are similar to those of MOSEV, MOSEV tends to provide higher burn severity levels than GFBS. We attribute this difference to variations in reflectance values and the different spatial resolutions of the two satellites. The GFBS dataset provides a more precise and reliable assessment of burn severity than existing available datasets. These enhancements are crucial for understanding the ecological impacts of forest fires and for informing management and recovery efforts in affected regions worldwide. The GFBS dataset is freely accessible at https://doi.org/10.5281/zenodo.10037629 (He et al., 2023).","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"27 1","pages":""},"PeriodicalIF":11.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141475230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SAR Image Semantic Segmentation of Typical Oceanic and Atmospheric Phenomena 典型海洋和大气现象的合成孔径雷达图像语义分割

IF 11.4 1区地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY

Earth System Science Data

Pub Date : 2024-07-01 DOI: 10.5194/essd-2024-222

Quankun Li, Xue Bai, Lizhen Hu, Liangsheng Li, Yaohui Bao, Xupu Geng, Xiao-Hai Yan

Abstract. The ocean surface exhibits a variety of oceanic and atmospheric phenomena. Automatically detecting and identifying these phenomena is crucial for understanding oceanic dynamics and ocean-atmosphere interactions. In this study, we select 2,383 Sentinel-1 WV mode images and 2,628 IW mode sub-images to construct a semantic segmentation dataset that includes 12 typical oceanic and atmospheric phenomena. Each phenomenon is represented by approximately 400 sub-images, resulting in a total of 5,011 images. The images in this dataset have a resolution of 100 meters and dimensions of 256 × 256 pixels. We propose a modified Segformer model to segment semantically these multiple categories of oceanic and atmospheric phenomena. Experimental results show that the modified Segformer model achieves an average Dice coefficient of 80.98 %, an average IoU of 70.32 %, and an overall accuracy of 87.13 %, demonstrating robust segmentation performance of typical oceanic and atmospheric phenomena in SAR images.

摘要海洋表面呈现出各种海洋和大气现象。自动检测和识别这些现象对于了解海洋动力学和海洋-大气相互作用至关重要。在本研究中，我们选择了 2,383 幅 Sentinel-1 WV 模式图像和 2,628 幅 IW 模式子图像，构建了一个语义分割数据集，其中包括 12 种典型的海洋和大气现象。每个现象由大约 400 幅子图像表示，因此总共有 5011 幅图像。该数据集中的图像分辨率为 100 米，尺寸为 256 × 256 像素。我们提出了一种改进的 Segformer 模型，用于从语义上分割这些多类别的海洋和大气现象。实验结果表明，修改后的 Segformer 模型的平均 Dice 系数为 80.98%，平均 IoU 为 70.32%，总体准确率为 87.13%，显示了对合成孔径雷达图像中典型海洋和大气现象的稳健分割性能。

引用次数: 0

Physical, social, and biological attributes for improved understanding and prediction of wildfires: FPA FOD-Attributes dataset 用于更好地了解和预测野火的物理、社会和生物属性：FPA FOD-Attributes 数据集

IF 11.4 1区地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY

Earth System Science Data

Pub Date : 2024-06-28 DOI: 10.5194/essd-16-3045-2024

Yavar Pourmohamad, John T. Abatzoglou, Erin J. Belval, Erica Fleishman, Karen Short, Matthew C. Reeves, Nicholas Nauslar, Philip E. Higuera, Eric Henderson, Sawyer Ball, Amir AghaKouchak, Jeffrey P. Prestemon, Julia Olszewski, Mojtaba Sadegh

Abstract. Wildfires are increasingly impacting social and environmental systems in the United States (US). The ability to mitigate the adverse effects of wildfires increases with understanding of the social, physical, and biological conditions that co-occurred with or caused the wildfire ignitions and contributed to the wildfire impacts. To this end, we developed the FPA FOD-Attributes dataset, which augments the sixth version of the Fire Program Analysis Fire-Occurrence Database (FPA FOD v6) with nearly 270 attributes that coincide with the date and location of each wildfire ignition in the US. FPA FOD v6 contains information on location, jurisdiction, discovery time, cause, and final size of >2.3×106 wildfires in the US between 1992 and 2020 . For each wildfire, we added physical (e.g., weather, climate, topography, and infrastructure), biological (e.g., land cover and normalized difference vegetation index), social (e.g., population density and social vulnerability index), and administrative (e.g., national and regional preparedness level and jurisdiction) attributes. This publicly available dataset can be used to answer numerous questions about the covariates associated with human- and lightning-caused wildfires. Furthermore, the FPA FOD-Attributes dataset can support descriptive, diagnostic, predictive, and prescriptive wildfire analytics, including the development of machine learning models. The FPA FOD-Attributes dataset is available at https://doi.org/10.5281/zenodo.8381129 (Pourmohamad et al., 2023).

摘要野火对美国社会和环境系统的影响越来越大。只有了解与野火同时发生或导致野火点燃并造成野火影响的社会、物理和生物条件，才能提高减轻野火不利影响的能力。为此，我们开发了 FPA FOD-Attributes 数据集，该数据集增加了第六版火灾计划分析火灾发生数据库（FPA FOD v6）的近 270 个属性，这些属性与美国每次野火点燃的日期和地点相吻合。FPA FOD v6 包含 1992 年至 2020 年间美国境内大于 2.3×106 场野火的地点、管辖范围、发现时间、起因和最终规模等信息。对于每场野火，我们都添加了物理（如天气、气候、地形和基础设施）、生物（如土地覆盖和归一化差异植被指数）、社会（如人口密度和社会脆弱性指数）和行政（如国家和地区备灾级别和管辖范围）属性。这一公开可用的数据集可用于回答与人为和雷电引起的野火相关的协变量方面的许多问题。此外，FPA FOD-Attributes 数据集还可支持描述性、诊断性、预测性和规范性野火分析，包括开发机器学习模型。FPA FOD-Attributes 数据集可在 https://doi.org/10.5281/zenodo.8381129 上获取（Pourmohamad 等人，2023 年）。

{"title":"Physical, social, and biological attributes for improved understanding and prediction of wildfires: FPA FOD-Attributes dataset","authors":"Yavar Pourmohamad, John T. Abatzoglou, Erin J. Belval, Erica Fleishman, Karen Short, Matthew C. Reeves, Nicholas Nauslar, Philip E. Higuera, Eric Henderson, Sawyer Ball, Amir AghaKouchak, Jeffrey P. Prestemon, Julia Olszewski, Mojtaba Sadegh","doi":"10.5194/essd-16-3045-2024","DOIUrl":"https://doi.org/10.5194/essd-16-3045-2024","url":null,"abstract":"Abstract. Wildfires are increasingly impacting social and environmental systems in the United States (US). The ability to mitigate the adverse effects of wildfires increases with understanding of the social, physical, and biological conditions that co-occurred with or caused the wildfire ignitions and contributed to the wildfire impacts. To this end, we developed the FPA FOD-Attributes dataset, which augments the sixth version of the Fire Program Analysis Fire-Occurrence Database (FPA FOD v6) with nearly 270 attributes that coincide with the date and location of each wildfire ignition in the US. FPA FOD v6 contains information on location, jurisdiction, discovery time, cause, and final size of >2.3×106 wildfires in the US between 1992 and 2020 . For each wildfire, we added physical (e.g., weather, climate, topography, and infrastructure), biological (e.g., land cover and normalized difference vegetation index), social (e.g., population density and social vulnerability index), and administrative (e.g., national and regional preparedness level and jurisdiction) attributes. This publicly available dataset can be used to answer numerous questions about the covariates associated with human- and lightning-caused wildfires. Furthermore, the FPA FOD-Attributes dataset can support descriptive, diagnostic, predictive, and prescriptive wildfire analytics, including the development of machine learning models. The FPA FOD-Attributes dataset is available at https://doi.org/10.5281/zenodo.8381129 (Pourmohamad et al., 2023).","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"8 1","pages":""},"PeriodicalIF":11.4,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141462463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dataset of spatially extensive long-term quality-assured land–atmosphere interactions over the Tibetan Plateau 青藏高原上空间广阔的长期质量保证陆地-大气相互作用数据集

IF 11.4 1区地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY

Earth System Science Data

Pub Date : 2024-06-28 DOI: 10.5194/essd-16-3017-2024

Yaoming Ma, Zhipeng Xie, Yingying Chen, Shaomin Liu, Tao Che, Ziwei Xu, Lunyu Shang, Xiaobo He, Xianhong Meng, Weiqiang Ma, Baiqing Xu, Huabiao Zhao, Junbo Wang, Guangjian Wu, Xin Li

Abstract. The climate of the Tibetan Plateau (TP) has experienced substantial changes in recent decades as a result of the location's susceptibility to global climate change. The changes observed across the TP are closely associated with regional land–atmosphere interactions. Current models and satellites struggle to accurately depict the interactions; therefore, critical field observations on land–atmosphere interactions outlined here provide necessary independent validation data and fine-scale process insights for constraining reanalysis products, remote sensing retrievals, and land surface model parameterizations. Scientific data sharing is crucial for the TP since in situ observations are rarely available under these harsh conditions. However, field observations are currently dispersed among individuals or groups and have not yet been integrated for comprehensive analysis. This has prevented a better understanding of the interactions, the unprecedented changes they generate, and the substantial ecological and environmental consequences they bring about. In this study, we collaborated with different agencies and organizations to present a comprehensive dataset for hourly measurements of surface energy balance components, soil hydrothermal properties, and near-surface micrometeorological conditions spanning up to 17 years (2005–2021). This dataset, derived from 12 field stations covering a variety of typical TP landscapes, provides the most extensive in situ observation data available for studying land–atmosphere interactions on the TP to date in terms of both spatial coverage and duration. Three categories of observations are provided in this dataset: meteorological gradient data (met), soil hydrothermal data (soil), and turbulent flux data (flux). To assure data quality, a set of rigorous data-processing and quality control procedures are implemented for all observation elements (e.g., wind speed and direction at different height) in this dataset. The operational workflow and procedures are individually tailored to the varied types of elements at each station, including automated error screening, manual inspection, diagnostic checking, adjustments, and quality flagging. The hourly raw data series; the quality-assured data; and supplementary information, including data integrity and the percentage of correct data on a monthly scale, are provided via the National Tibetan Plateau Data Center (https://doi.org/10.11888/Atmos.tpdc.300977, Ma et al., 2023a). With the greatest number of stations covered, the fullest collection of meteorological elements, and the longest duration of observations and recordings to date, this dataset is the most extensive hourly land–atmosphere interaction observation dataset for the TP. It will serve as the benchmark for evaluating and refining land surface models, reanalysis products, and remote sensing retrievals, as well as for characterizing fine-scale land–atmosphere interaction processes of the TP and underlying

摘要由于青藏高原易受全球气候变化的影响，近几十年来该地区的气候发生了巨大变化。在整个青藏高原观测到的变化与区域陆地-大气相互作用密切相关。目前的模式和卫星都难以准确描述这种相互作用；因此，本文概述的有关陆地-大气相互作用的关键实地观测数据提供了必要的独立验证数据和精细尺度过程见解，用于约束再分析产品、遥感检索和陆地表面模式参数化。科学数据共享对热带雨林至关重要，因为在这些恶劣条件下很少有实地观测数据。然而，实地观测数据目前分散在个人或小组中，尚未整合起来进行综合分析。这就阻碍了我们更好地了解这些相互作用、它们产生的前所未有的变化以及它们带来的重大生态和环境后果。在这项研究中，我们与不同的机构和组织合作，提供了一个全面的数据集，每小时测量地表能量平衡成分、土壤热液特性和近地表微气象条件，时间跨度长达 17 年（2005-2021 年）。该数据集来自 12 个野外观测站，涵盖了各种典型的大陆坡地貌，在空间覆盖范围和持续时间方面提供了迄今为止最广泛的原位观测数据，用于研究大陆坡上陆地与大气的相互作用。该数据集提供了三类观测数据：气象梯度数据（气象）、土壤热液数据（土壤）和湍流通量数据（通量）。为确保数据质量，对该数据集中的所有观测要素（如不同高度的风速和风向）都实施了一套严格的数据处理和质量控制程序。操作工作流程和程序是根据每个站点不同类型的要素量身定制的，包括自动错误筛选、人工检查、诊断检查、调整和质量标记。通过国家青藏高原数据中心（https://doi.org/10.11888/Atmos.tpdc.300977, Ma et al., 2023a）提供每小时原始数据序列、质量保证数据以及补充信息，包括数据完整性和月度正确数据百分比。该数据集覆盖的站点数量最多、气象要素收集最全、观测和记录时间最长，是迄今为止青藏高原最广泛的陆地-大气相互作用小时观测数据集。它将成为评估和完善陆地表面模式、再分析产品和遥感检索的基准，也是描述大洋洲热带雨林细尺度陆地-大气相互作用过程及其影响机制的特征的基准。

{"title":"Dataset of spatially extensive long-term quality-assured land–atmosphere interactions over the Tibetan Plateau","authors":"Yaoming Ma, Zhipeng Xie, Yingying Chen, Shaomin Liu, Tao Che, Ziwei Xu, Lunyu Shang, Xiaobo He, Xianhong Meng, Weiqiang Ma, Baiqing Xu, Huabiao Zhao, Junbo Wang, Guangjian Wu, Xin Li","doi":"10.5194/essd-16-3017-2024","DOIUrl":"https://doi.org/10.5194/essd-16-3017-2024","url":null,"abstract":"Abstract. The climate of the Tibetan Plateau (TP) has experienced substantial changes in recent decades as a result of the location's susceptibility to global climate change. The changes observed across the TP are closely associated with regional land–atmosphere interactions. Current models and satellites struggle to accurately depict the interactions; therefore, critical field observations on land–atmosphere interactions outlined here provide necessary independent validation data and fine-scale process insights for constraining reanalysis products, remote sensing retrievals, and land surface model parameterizations. Scientific data sharing is crucial for the TP since in situ observations are rarely available under these harsh conditions. However, field observations are currently dispersed among individuals or groups and have not yet been integrated for comprehensive analysis. This has prevented a better understanding of the interactions, the unprecedented changes they generate, and the substantial ecological and environmental consequences they bring about. In this study, we collaborated with different agencies and organizations to present a comprehensive dataset for hourly measurements of surface energy balance components, soil hydrothermal properties, and near-surface micrometeorological conditions spanning up to 17 years (2005–2021). This dataset, derived from 12 field stations covering a variety of typical TP landscapes, provides the most extensive in situ observation data available for studying land–atmosphere interactions on the TP to date in terms of both spatial coverage and duration. Three categories of observations are provided in this dataset: meteorological gradient data (met), soil hydrothermal data (soil), and turbulent flux data (flux). To assure data quality, a set of rigorous data-processing and quality control procedures are implemented for all observation elements (e.g., wind speed and direction at different height) in this dataset. The operational workflow and procedures are individually tailored to the varied types of elements at each station, including automated error screening, manual inspection, diagnostic checking, adjustments, and quality flagging. The hourly raw data series; the quality-assured data; and supplementary information, including data integrity and the percentage of correct data on a monthly scale, are provided via the National Tibetan Plateau Data Center (https://doi.org/10.11888/Atmos.tpdc.300977, Ma et al., 2023a). With the greatest number of stations covered, the fullest collection of meteorological elements, and the longest duration of observations and recordings to date, this dataset is the most extensive hourly land–atmosphere interaction observation dataset for the TP. It will serve as the benchmark for evaluating and refining land surface models, reanalysis products, and remote sensing retrievals, as well as for characterizing fine-scale land–atmosphere interaction processes of the TP and underlying ","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"88 1","pages":""},"PeriodicalIF":11.4,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141462497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MUDA: dynamic geophysical and geochemical MUltiparametric DAtabase MUDA：动态地球物理和地球化学多参数数据库

IF 11.4 1区地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY

Earth System Science Data

Pub Date : 2024-06-27 DOI: 10.5194/essd-2024-185

Marco Massa, Andrea Luca Rizzo, Davide Scafidi, Elisa Ferrari, Sara Lovati, Lucia Luzi, the MUDA working group

Abstract. In this paper, the new dynamic geophysical and geochemical MUltiparametric DAtabase (MUDA) is presented. MUDA is a new infrastructure of the National Institute of Geophysics and Volcanology (INGV), published on-line in December 2023, with the aim of archiving and disseminating multiparametric data collected by multidisciplinary monitoring networks. MUDA is a MySQL relational database with a web interface developed in php, aimed at investigating in quasi real time possible correlations between seismic phenomena and variations in endogenous and environmental parameters. At present, MUDA collects data from different types of sensors such as hydrogeochemical probes for physical-chemical parameters in waters, meteorological stations, detectors of air Radon concentration, diffusive flux of carbon dioxide (CO₂) and seismometers belonging both to the National Seismic Network of INGV and to temporary networks installed in the framework of multidisciplinary research projects. MUDA daily publishes data updated to the previous day and offers the chance to view and download multiparametric time series selected for different time periods. The resultant dataset provides broad perspectives in the framework of future high frequency and continuous multiparametric monitorings as a starting point to identify possible seismic precursors for short-term earthquake forecasting. MUDA is now quoted with the Digital Object Identifier https://doi.org/10.13127/muda (Massa et al., 2023).

摘要本文介绍了新的动态地球物理和地球化学多参数数据库（MUDA）。MUDA是国家地球物理和火山学研究所（INGV）的一个新的基础设施，于2023年12月在线发布，旨在归档和传播多学科监测网络收集的多参数数据。MUDA 是一个 MySQL 关系数据库，采用 php 开发网络接口，旨在准实时调查地震现象与内源参数和环境参数变化之间可能存在的相关性。目前，MUDA 从不同类型的传感器收集数据，如水体物理化学参数的水文地质化学探测器、气象站、空气氡浓度探测器、二氧化碳（CO2）扩散通量探测器和地震仪，这些传感器既属于 INGV 国家地震网络，也属于在多学科研究项目框架内安装的临时网络。MUDA 每天发布前一天的最新数据，并提供查看和下载不同时间段多参数时间序列的机会。由此产生的数据集为未来的高频和连续多参数监测提供了广阔的前景，并以此为起点，为短期地震预报确定可能的地震前兆。MUDA 现以数字对象标识符 https://doi.org/10.13127/muda 引用（Massa 等人，2023 年）。

{"title":"MUDA: dynamic geophysical and geochemical MUltiparametric DAtabase","authors":"Marco Massa, Andrea Luca Rizzo, Davide Scafidi, Elisa Ferrari, Sara Lovati, Lucia Luzi, the MUDA working group","doi":"10.5194/essd-2024-185","DOIUrl":"https://doi.org/10.5194/essd-2024-185","url":null,"abstract":"Abstract. In this paper, the new dynamic geophysical and geochemical MUltiparametric DAtabase (MUDA) is presented. MUDA is a new infrastructure of the National Institute of Geophysics and Volcanology (INGV), published on-line in December 2023, with the aim of archiving and disseminating multiparametric data collected by multidisciplinary monitoring networks. MUDA is a MySQL relational database with a web interface developed in php, aimed at investigating in quasi real time possible correlations between seismic phenomena and variations in endogenous and environmental parameters. At present, MUDA collects data from different types of sensors such as hydrogeochemical probes for physical-chemical parameters in waters, meteorological stations, detectors of air Radon concentration, diffusive flux of carbon dioxide (CO2) and seismometers belonging both to the National Seismic Network of INGV and to temporary networks installed in the framework of multidisciplinary research projects. MUDA daily publishes data updated to the previous day and offers the chance to view and download multiparametric time series selected for different time periods. The resultant dataset provides broad perspectives in the framework of future high frequency and continuous multiparametric monitorings as a starting point to identify possible seismic precursors for short-term earthquake forecasting. MUDA is now quoted with the Digital Object Identifier https://doi.org/10.13127/muda (Massa et al., 2023).","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"68 1","pages":""},"PeriodicalIF":11.4,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141462445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Characterizing clouds with the CCClim dataset, a machine learning cloud class climatology 利用机器学习云类气候学 CCClim 数据集描述云的特征

IF 11.4 1区地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY

Earth System Science Data

Pub Date : 2024-06-27 DOI: 10.5194/essd-16-3001-2024

Arndt Kaps, Axel Lauer, Rémi Kazeroni, Martin Stengel, Veronika Eyring

Abstract. We present the new Cloud Class Climatology (CCClim) dataset, quantifying the global distribution of established morphological cloud types over 35 years. CCClim combines active and passive sensor data with machine learning (ML) and provides a new opportunity for improving the understanding of clouds and their related processes. CCClim is based on cloud property retrievals from the European Space Agency's (ESA) Cloud_cci dataset, adding relative occurrences of eight major cloud types, designed to be similar to those defined by the World Meteorological Organization (WMO) at 1° resolution. The ML framework used to obtain the cloud types is trained on data from multiple satellites in the afternoon constellation (A-Train). Using multiple spaceborne sensors reduces the impact of single-sensor problems like the difficulty of passive sensors to detect thin cirrus or the small footprint of active sensors. We leverage this to generate sufficient labeled data to train supervised ML models. CCClim's global coverage being almost gapless from 1982 to 2016 allows for performing process-oriented analyses of clouds on a climatological timescale. Similarly, the moderate spatial and temporal resolutions make it a lightweight dataset while enabling straightforward comparison to climate models. CCClim creates multiple opportunities to study clouds, of which we sketch out a few examples. Along with the cloud-type frequencies, CCClim contains the cloud properties used as inputs to the ML framework, such that all cloud types can be associated with relevant physical quantities. CCClim can also be combined with other datasets such as reanalysis data to assess the dynamical regime favoring the occurrence of a specific cloud type in association with its properties. Additionally, we show an example of how to evaluate a global climate model by comparing CCClim with cloud types obtained by applying the same ML method used to create CCClim to output from the icosahedral nonhydrostatic atmosphere model (ICON-A). CCClim can be accessed via the following digital object identifier: https://doi.org/10.5281/zenodo.8369202 (Kaps et al., 2023b).

摘要我们介绍了新的云类气候学（CCClim）数据集，该数据集量化了 35 年来既定形态云类型的全球分布情况。CCClim 将主动和被动传感器数据与机器学习 (ML) 相结合，为增进对云及其相关过程的了解提供了一个新机会。CCClim 基于欧洲航天局（ESA）Cloud_cci 数据集的云属性检索，增加了八种主要云类型的相对出现率，与世界气象组织（WMO）定义的 1° 分辨率云类型相似。用于获取云类型的 ML 框架是通过下午星座（A-Train）中多颗卫星的数据进行训练的。使用多个星载传感器可减少单传感器问题的影响，如被动传感器难以探测薄卷云或主动传感器的足迹较小。我们利用这一点来生成足够的标记数据，以训练有监督的 ML 模型。CCClim 的全球覆盖范围从 1982 年到 2016 年几乎没有间隙，因此可以在气候学时间尺度上对云进行过程导向分析。同样，适中的空间和时间分辨率使其成为一个轻量级数据集，同时可以直接与气候模型进行比较。CCClim 为研究云层提供了多种机会，我们仅举几个例子。除了云类型频率，CCClim 还包含作为 ML 框架输入的云属性，因此所有云类型都可以与相关物理量联系起来。CCClim 还可与其他数据集（如再分析数据）相结合，评估有利于特定云类型出现的动力学机制及其属性。此外，我们还举例说明了如何将 CCClim 与二十面体非流体静力学大气模型 (ICON-A) 的输出结果进行比较，从而评估全球气候模型。可通过以下数字对象标识符访问 CCClim：https://doi.org/10.5281/zenodo.8369202（Kaps 等人，2023b）。

{"title":"Characterizing clouds with the CCClim dataset, a machine learning cloud class climatology","authors":"Arndt Kaps, Axel Lauer, Rémi Kazeroni, Martin Stengel, Veronika Eyring","doi":"10.5194/essd-16-3001-2024","DOIUrl":"https://doi.org/10.5194/essd-16-3001-2024","url":null,"abstract":"Abstract. We present the new Cloud Class Climatology (CCClim) dataset, quantifying the global distribution of established morphological cloud types over 35 years. CCClim combines active and passive sensor data with machine learning (ML) and provides a new opportunity for improving the understanding of clouds and their related processes. CCClim is based on cloud property retrievals from the European Space Agency's (ESA) Cloud_cci dataset, adding relative occurrences of eight major cloud types, designed to be similar to those defined by the World Meteorological Organization (WMO) at 1° resolution. The ML framework used to obtain the cloud types is trained on data from multiple satellites in the afternoon constellation (A-Train). Using multiple spaceborne sensors reduces the impact of single-sensor problems like the difficulty of passive sensors to detect thin cirrus or the small footprint of active sensors. We leverage this to generate sufficient labeled data to train supervised ML models. CCClim's global coverage being almost gapless from 1982 to 2016 allows for performing process-oriented analyses of clouds on a climatological timescale. Similarly, the moderate spatial and temporal resolutions make it a lightweight dataset while enabling straightforward comparison to climate models. CCClim creates multiple opportunities to study clouds, of which we sketch out a few examples. Along with the cloud-type frequencies, CCClim contains the cloud properties used as inputs to the ML framework, such that all cloud types can be associated with relevant physical quantities. CCClim can also be combined with other datasets such as reanalysis data to assess the dynamical regime favoring the occurrence of a specific cloud type in association with its properties. Additionally, we show an example of how to evaluate a global climate model by comparing CCClim with cloud types obtained by applying the same ML method used to create CCClim to output from the icosahedral nonhydrostatic atmosphere model (ICON-A). CCClim can be accessed via the following digital object identifier: https://doi.org/10.5281/zenodo.8369202 (Kaps et al., 2023b).","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"28 1","pages":""},"PeriodicalIF":11.4,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141461882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ChatEarthNet: A Global-Scale Image-Text Dataset Empowering Vision-Language Geo-Foundation Models ChatEarthNet：支持视觉语言地理基础模型的全球图像-文本数据集

IF 11.4 1区地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY

Earth System Science Data

Pub Date : 2024-06-27 DOI: 10.5194/essd-2024-140

Zhenghang Yuan, Zhitong Xiong, Lichao Mou, Xiao Xiang Zhu

Abstract. The rapid development of remote sensing technology has led to an exponential growth in satellite images, yet their inherent complexity often makes them difficult for non-expert users to understand. Natural language, as a carrier of human knowledge, can bridge common users and complicated satellite imagery. Additionally, when paired with visual data, natural language can be utilized to train large vision-language foundation models, significantly improving performance in various tasks. Despite these advancements, the remote sensing community still faces a challenge due to the lack of large- scale, high-quality vision-language datasets for satellite images. To address this challenge, we introduce a new image-text dataset, providing high-quality natural language descriptions for global-scale satellite data. Specifically, we utilize Sentinel-2 data for its global coverage as the foundational image source, employing semantic segmentation labels from the European Space Agency’s WorldCover project to enrich the descriptions of land covers. By conducting in-depth semantic analysis, we formulate detailed prompts to elicit rich descriptions from ChatGPT. We then include a manual verification process to enhance the dataset’s quality further. This step involves manual inspection and correction to refine the dataset. Finally, we offer the community ChatEarthNet, a large-scale image-text dataset characterized by global coverage, high quality, wide-ranging diversity, and detailed descriptions. ChatEarthNet consists of 163,488 image-text pairs with captions generated by ChatGPT3.5 and an additional 10,000 image-text pairs with captions generated by ChatGPT-4V(ision). This dataset has significant potential for both training and evaluating vision-language geo-foundation models for remote sensing. The code is publicly available at https://doi.org/10.5281/zenodo.11004358 (Yuan et al., 2024b), and the ChatEarthNet dataset is at https://doi.org/10.5281/zenodo.11003436 (Yuan et al., 2024c).

摘要遥感技术的飞速发展使卫星图像呈指数级增长，但其固有的复杂性往往使非专业用户难以理解。自然语言作为人类知识的载体，可以在普通用户和复杂的卫星图像之间架起一座桥梁。此外，在与视觉数据配对时，自然语言可用于训练大型视觉语言基础模型，从而显著提高各种任务的性能。尽管取得了这些进步，遥感界仍然面临着一个挑战，那就是缺乏大规模、高质量的卫星图像视觉语言数据集。为了应对这一挑战，我们引入了一个新的图像-文本数据集，为全球范围的卫星数据提供高质量的自然语言描述。具体来说，我们利用 Sentinel-2 数据的全球覆盖范围作为基础图像源，采用欧洲航天局 WorldCover 项目的语义分割标签来丰富土地覆盖的描述。通过深入的语义分析，我们制定了详细的提示，以便从 ChatGPT 中获得丰富的描述。然后，我们加入了人工验证流程，以进一步提高数据集的质量。这一步骤包括人工检查和修正，以完善数据集。最后，我们为社区提供了大型图像-文本数据集 ChatEarthNet，该数据集具有全球覆盖、高质量、广泛多样性和详细描述等特点。ChatEarthNet 包含由 ChatGPT3.5 生成标题的 163,488 对图像-文本，以及由 ChatGPT-4V(ision) 生成标题的另外 10,000 对图像-文本。该数据集在训练和评估遥感视觉语言地理基础模型方面具有巨大潜力。代码可在 https://doi.org/10.5281/zenodo.11004358（Yuan et al.，2024b）上公开获取，ChatEarthNet 数据集可在 https://doi.org/10.5281/zenodo.11003436（Yuan et al.，2024c）上公开获取。

{"title":"ChatEarthNet: A Global-Scale Image-Text Dataset Empowering Vision-Language Geo-Foundation Models","authors":"Zhenghang Yuan, Zhitong Xiong, Lichao Mou, Xiao Xiang Zhu","doi":"10.5194/essd-2024-140","DOIUrl":"https://doi.org/10.5194/essd-2024-140","url":null,"abstract":"Abstract. The rapid development of remote sensing technology has led to an exponential growth in satellite images, yet their inherent complexity often makes them difficult for non-expert users to understand. Natural language, as a carrier of human knowledge, can bridge common users and complicated satellite imagery. Additionally, when paired with visual data, natural language can be utilized to train large vision-language foundation models, significantly improving performance in various tasks. Despite these advancements, the remote sensing community still faces a challenge due to the lack of large- scale, high-quality vision-language datasets for satellite images. To address this challenge, we introduce a new image-text dataset, providing high-quality natural language descriptions for global-scale satellite data. Specifically, we utilize Sentinel-2 data for its global coverage as the foundational image source, employing semantic segmentation labels from the European Space Agency’s WorldCover project to enrich the descriptions of land covers. By conducting in-depth semantic analysis, we formulate detailed prompts to elicit rich descriptions from ChatGPT. We then include a manual verification process to enhance the dataset’s quality further. This step involves manual inspection and correction to refine the dataset. Finally, we offer the community ChatEarthNet, a large-scale image-text dataset characterized by global coverage, high quality, wide-ranging diversity, and detailed descriptions. ChatEarthNet consists of 163,488 image-text pairs with captions generated by ChatGPT3.5 and an additional 10,000 image-text pairs with captions generated by ChatGPT-4V(ision). This dataset has significant potential for both training and evaluating vision-language geo-foundation models for remote sensing. The code is publicly available at https://doi.org/10.5281/zenodo.11004358 (Yuan et al., 2024b), and the ChatEarthNet dataset is at https://doi.org/10.5281/zenodo.11003436 (Yuan et al., 2024c).","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"62 1","pages":""},"PeriodicalIF":11.4,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141461965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Atmospheric Radiation Measurement (ARM) airborne field campaign data products between 2013 and 2018 2013 年至 2018 年大气辐射测量（ARM）机载实地活动数据产品

IF 11.4 1区地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY

Earth System Science Data

Pub Date : 2024-06-27 DOI: 10.5194/essd-2024-97

Fan Mei, Jennifer M. Comstock, Mikhail S. Pekour, Jerome D. Fast, Beat Schmid, Krista L. Gaustad, Shuaiqi Tang, Damao Zhang, John E. Shilling, Jason Tomlinson, Adam C. Varble, Jian Wang, L. Ruby Leung, Lawrence Kleinman, Scot Martin, Sebastien C. Biraud, Brian D. Ermold, Kenneth W. Burk

Abstract. Airborne measurements are pivotal for providing detailed, spatiotemporally resolved information about atmospheric parameters, and aerosol and cloud properties, thereby enhancing our understanding of dynamic atmospheric processes. For 30 years, the U.S. Department of Energy (DOE) Office of Science supported an instrumented Gulfstream-1 (G-1) aircraft for atmospheric field campaigns. Data from the final decade of G-1 operations were archived by the Atmospheric Radiation Measurement (ARM) user facility Data Center and made publicly available at no cost to all registered users. To ensure a consistent data format and to improve the accessibility of the ARM airborne data, an integrated dataset was recently developed covering the final six years of G-1 operations (2013 to 2018). The integrated dataset includes data collected from 236 flights (766.4 hours), which covered the Arctic, the U.S. Southern Great Plains (SGP), the U.S. West Coast, the Eastern North Atlantic (ENA), the Amazon Basin in Brazil, and the Sierras de Córdoba range in Argentina. These comprehensive data streams provide much-needed insight into spatiotemporal variability of thermodynamic quantities, aerosol and cloud states and properties for addressing essential science questions in Earth system process studies. This manuscript describes the DOE ARM merged G-1 datasets, including information on the acquisition, collection, and quality control processes. It further illustrates the usage of this merged dataset to evaluate the Energy Exascale Earth System Model (E3SM) with the Earth System Model Aerosol-Cloud Diagnostics (ESMAC Diags) package.

摘要机载测量对于提供有关大气参数、气溶胶和云特性的详细时空分辨信息至关重要，从而增强了我们对动态大气过程的了解。30 年来，美国能源部（DOE）科学办公室一直支持一架配备仪器的湾流-1（G-1）飞机进行大气实地活动。G-1 最后十年的运行数据由大气辐射测量（ARM）用户设施数据中心存档，并向所有注册用户免费公开。为确保数据格式的一致性并提高 ARM 机载数据的可访问性，最近开发了一个综合数据集，涵盖 G-1 行动的最后六年（2013 年至 2018 年）。综合数据集包括 236 次飞行（766.4 小时）收集的数据，覆盖北极、美国南部大平原 (SGP)、美国西海岸、北大西洋东部 (ENA)、巴西亚马逊盆地和阿根廷科尔多瓦山脉。这些综合数据流为解决地球系统过程研究中的基本科学问题提供了急需的热力学量、气溶胶和云状态及特性的时空变化洞察力。本手稿介绍了 DOE ARM 合并 G-1 数据集，包括有关获取、收集和质量控制过程的信息。它进一步说明了如何使用该合并数据集来评估能源超大规模地球系统模型（ESM）与地球系统模型气溶胶-云诊断（ESMAC Diags）软件包。

{"title":"Atmospheric Radiation Measurement (ARM) airborne field campaign data products between 2013 and 2018","authors":"Fan Mei, Jennifer M. Comstock, Mikhail S. Pekour, Jerome D. Fast, Beat Schmid, Krista L. Gaustad, Shuaiqi Tang, Damao Zhang, John E. Shilling, Jason Tomlinson, Adam C. Varble, Jian Wang, L. Ruby Leung, Lawrence Kleinman, Scot Martin, Sebastien C. Biraud, Brian D. Ermold, Kenneth W. Burk","doi":"10.5194/essd-2024-97","DOIUrl":"https://doi.org/10.5194/essd-2024-97","url":null,"abstract":"Abstract. Airborne measurements are pivotal for providing detailed, spatiotemporally resolved information about atmospheric parameters, and aerosol and cloud properties, thereby enhancing our understanding of dynamic atmospheric processes. For 30 years, the U.S. Department of Energy (DOE) Office of Science supported an instrumented Gulfstream-1 (G-1) aircraft for atmospheric field campaigns. Data from the final decade of G-1 operations were archived by the Atmospheric Radiation Measurement (ARM) user facility Data Center and made publicly available at no cost to all registered users. To ensure a consistent data format and to improve the accessibility of the ARM airborne data, an integrated dataset was recently developed covering the final six years of G-1 operations (2013 to 2018). The integrated dataset includes data collected from 236 flights (766.4 hours), which covered the Arctic, the U.S. Southern Great Plains (SGP), the U.S. West Coast, the Eastern North Atlantic (ENA), the Amazon Basin in Brazil, and the Sierras de Córdoba range in Argentina. These comprehensive data streams provide much-needed insight into spatiotemporal variability of thermodynamic quantities, aerosol and cloud states and properties for addressing essential science questions in Earth system process studies. This manuscript describes the DOE ARM merged G-1 datasets, including information on the acquisition, collection, and quality control processes. It further illustrates the usage of this merged dataset to evaluate the Energy Exascale Earth System Model (E3SM) with the Earth System Model Aerosol-Cloud Diagnostics (ESMAC Diags) package.","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"29 1","pages":""},"PeriodicalIF":11.4,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141462563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0