Wuxin Wang, Weicheng Ni, Tao Han, Lei Bai, Boheng Duan, Kaijun Ren
{"title":"DABench:数据驱动的天气数据同化基准数据集","authors":"Wuxin Wang, Weicheng Ni, Tao Han, Lei Bai, Boheng Duan, Kaijun Ren","doi":"arxiv-2408.11438","DOIUrl":null,"url":null,"abstract":"Recent advancements in deep learning (DL) have led to the development of\nseveral Large Weather Models (LWMs) that rival state-of-the-art (SOTA)\nnumerical weather prediction (NWP) systems. Up to now, these models still rely\non traditional NWP-generated analysis fields as input and are far from being an\nautonomous system. While researchers are exploring data-driven data\nassimilation (DA) models to generate accurate initial fields for LWMs, the lack\nof a standard benchmark impedes the fair evaluation among different data-driven\nDA algorithms. Here, we introduce DABench, a benchmark dataset utilizing ERA5\ndata as ground truth to guide the development of end-to-end data-driven weather\nprediction systems. DABench contributes four standard features: (1) sparse and\nnoisy simulated observations under the guidance of the observing system\nsimulation experiment method; (2) a skillful pre-trained weather prediction\nmodel to generate background fields while fairly evaluating the impact of\nassimilation outcomes on predictions; (3) standardized evaluation metrics for\nmodel comparison; (4) a strong baseline called the DA Transformer (DaT). DaT\nintegrates the four-dimensional variational DA prior knowledge into the\nTransformer model and outperforms the SOTA in physical state reconstruction,\nnamed 4DVarNet. Furthermore, we exemplify the development of an end-to-end\ndata-driven weather prediction system by integrating DaT with the prediction\nmodel. Researchers can leverage DABench to develop their models and compare\nperformance against established baselines, which will benefit the future\nadvancements of data-driven weather prediction systems. The code is available\non this Github repository and the dataset is available at the Baidu Drive.","PeriodicalId":501166,"journal":{"name":"arXiv - PHYS - Atmospheric and Oceanic Physics","volume":"180 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DABench: A Benchmark Dataset for Data-Driven Weather Data Assimilation\",\"authors\":\"Wuxin Wang, Weicheng Ni, Tao Han, Lei Bai, Boheng Duan, Kaijun Ren\",\"doi\":\"arxiv-2408.11438\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent advancements in deep learning (DL) have led to the development of\\nseveral Large Weather Models (LWMs) that rival state-of-the-art (SOTA)\\nnumerical weather prediction (NWP) systems. Up to now, these models still rely\\non traditional NWP-generated analysis fields as input and are far from being an\\nautonomous system. While researchers are exploring data-driven data\\nassimilation (DA) models to generate accurate initial fields for LWMs, the lack\\nof a standard benchmark impedes the fair evaluation among different data-driven\\nDA algorithms. Here, we introduce DABench, a benchmark dataset utilizing ERA5\\ndata as ground truth to guide the development of end-to-end data-driven weather\\nprediction systems. DABench contributes four standard features: (1) sparse and\\nnoisy simulated observations under the guidance of the observing system\\nsimulation experiment method; (2) a skillful pre-trained weather prediction\\nmodel to generate background fields while fairly evaluating the impact of\\nassimilation outcomes on predictions; (3) standardized evaluation metrics for\\nmodel comparison; (4) a strong baseline called the DA Transformer (DaT). DaT\\nintegrates the four-dimensional variational DA prior knowledge into the\\nTransformer model and outperforms the SOTA in physical state reconstruction,\\nnamed 4DVarNet. Furthermore, we exemplify the development of an end-to-end\\ndata-driven weather prediction system by integrating DaT with the prediction\\nmodel. Researchers can leverage DABench to develop their models and compare\\nperformance against established baselines, which will benefit the future\\nadvancements of data-driven weather prediction systems. The code is available\\non this Github repository and the dataset is available at the Baidu Drive.\",\"PeriodicalId\":501166,\"journal\":{\"name\":\"arXiv - PHYS - Atmospheric and Oceanic Physics\",\"volume\":\"180 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - PHYS - Atmospheric and Oceanic Physics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.11438\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Atmospheric and Oceanic Physics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.11438","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
深度学习(DL)领域的最新进展已导致开发出多个大型天气模型(LWM),可与最先进的(SOTA)数值天气预报(NWP)系统相媲美。迄今为止,这些模型仍依赖传统的 NWP 生成的分析场作为输入,远非自主系统。虽然研究人员正在探索数据驱动的数据同化(DA)模式,以便为 LWMs 生成精确的初始场,但标准基准的缺乏妨碍了对不同数据驱动的 DA 算法进行公平评估。在此,我们介绍 DABench,这是一个利用ERA5数据作为地面实况的基准数据集,用于指导端到端数据驱动天气预报系统的开发。DABench 有四个标准特征:(1)在观测系统模拟实验方法指导下的稀疏和噪声模拟观测;(2)熟练的预训练天气预报模型,用于生成背景场,同时公平地评估同化结果对预测的影响;(3)标准化的评估指标,用于模型比较;(4)称为 DA Transformer (DaT)的强大基线。DaT 将四维变分 DA 先验知识整合到变换器模型中,在物理状态重建方面优于 SOTA,被命名为 4DVarNet。此外,我们还举例说明了通过将 DaT 与预测模型集成,开发端到端数据驱动天气预报系统的过程。研究人员可以利用 DABench 开发自己的模型,并与既定基线比较性能,这将有利于数据驱动天气预报系统的未来发展。代码可在此 Github 代码库中获取,数据集可在百度硬盘中获取。
DABench: A Benchmark Dataset for Data-Driven Weather Data Assimilation
Recent advancements in deep learning (DL) have led to the development of
several Large Weather Models (LWMs) that rival state-of-the-art (SOTA)
numerical weather prediction (NWP) systems. Up to now, these models still rely
on traditional NWP-generated analysis fields as input and are far from being an
autonomous system. While researchers are exploring data-driven data
assimilation (DA) models to generate accurate initial fields for LWMs, the lack
of a standard benchmark impedes the fair evaluation among different data-driven
DA algorithms. Here, we introduce DABench, a benchmark dataset utilizing ERA5
data as ground truth to guide the development of end-to-end data-driven weather
prediction systems. DABench contributes four standard features: (1) sparse and
noisy simulated observations under the guidance of the observing system
simulation experiment method; (2) a skillful pre-trained weather prediction
model to generate background fields while fairly evaluating the impact of
assimilation outcomes on predictions; (3) standardized evaluation metrics for
model comparison; (4) a strong baseline called the DA Transformer (DaT). DaT
integrates the four-dimensional variational DA prior knowledge into the
Transformer model and outperforms the SOTA in physical state reconstruction,
named 4DVarNet. Furthermore, we exemplify the development of an end-to-end
data-driven weather prediction system by integrating DaT with the prediction
model. Researchers can leverage DABench to develop their models and compare
performance against established baselines, which will benefit the future
advancements of data-driven weather prediction systems. The code is available
on this Github repository and the dataset is available at the Baidu Drive.