DABench: A Benchmark Dataset for Data-Driven Weather Data Assimilation

Wuxin Wang, Weicheng Ni, Tao Han, Lei Bai, Boheng Duan, Kaijun Ren
{"title":"DABench: A Benchmark Dataset for Data-Driven Weather Data Assimilation","authors":"Wuxin Wang, Weicheng Ni, Tao Han, Lei Bai, Boheng Duan, Kaijun Ren","doi":"arxiv-2408.11438","DOIUrl":null,"url":null,"abstract":"Recent advancements in deep learning (DL) have led to the development of\nseveral Large Weather Models (LWMs) that rival state-of-the-art (SOTA)\nnumerical weather prediction (NWP) systems. Up to now, these models still rely\non traditional NWP-generated analysis fields as input and are far from being an\nautonomous system. While researchers are exploring data-driven data\nassimilation (DA) models to generate accurate initial fields for LWMs, the lack\nof a standard benchmark impedes the fair evaluation among different data-driven\nDA algorithms. Here, we introduce DABench, a benchmark dataset utilizing ERA5\ndata as ground truth to guide the development of end-to-end data-driven weather\nprediction systems. DABench contributes four standard features: (1) sparse and\nnoisy simulated observations under the guidance of the observing system\nsimulation experiment method; (2) a skillful pre-trained weather prediction\nmodel to generate background fields while fairly evaluating the impact of\nassimilation outcomes on predictions; (3) standardized evaluation metrics for\nmodel comparison; (4) a strong baseline called the DA Transformer (DaT). DaT\nintegrates the four-dimensional variational DA prior knowledge into the\nTransformer model and outperforms the SOTA in physical state reconstruction,\nnamed 4DVarNet. Furthermore, we exemplify the development of an end-to-end\ndata-driven weather prediction system by integrating DaT with the prediction\nmodel. Researchers can leverage DABench to develop their models and compare\nperformance against established baselines, which will benefit the future\nadvancements of data-driven weather prediction systems. The code is available\non this Github repository and the dataset is available at the Baidu Drive.","PeriodicalId":501166,"journal":{"name":"arXiv - PHYS - Atmospheric and Oceanic Physics","volume":"180 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Atmospheric and Oceanic Physics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.11438","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Recent advancements in deep learning (DL) have led to the development of several Large Weather Models (LWMs) that rival state-of-the-art (SOTA) numerical weather prediction (NWP) systems. Up to now, these models still rely on traditional NWP-generated analysis fields as input and are far from being an autonomous system. While researchers are exploring data-driven data assimilation (DA) models to generate accurate initial fields for LWMs, the lack of a standard benchmark impedes the fair evaluation among different data-driven DA algorithms. Here, we introduce DABench, a benchmark dataset utilizing ERA5 data as ground truth to guide the development of end-to-end data-driven weather prediction systems. DABench contributes four standard features: (1) sparse and noisy simulated observations under the guidance of the observing system simulation experiment method; (2) a skillful pre-trained weather prediction model to generate background fields while fairly evaluating the impact of assimilation outcomes on predictions; (3) standardized evaluation metrics for model comparison; (4) a strong baseline called the DA Transformer (DaT). DaT integrates the four-dimensional variational DA prior knowledge into the Transformer model and outperforms the SOTA in physical state reconstruction, named 4DVarNet. Furthermore, we exemplify the development of an end-to-end data-driven weather prediction system by integrating DaT with the prediction model. Researchers can leverage DABench to develop their models and compare performance against established baselines, which will benefit the future advancements of data-driven weather prediction systems. The code is available on this Github repository and the dataset is available at the Baidu Drive.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
DABench:数据驱动的天气数据同化基准数据集
深度学习(DL)领域的最新进展已导致开发出多个大型天气模型(LWM),可与最先进的(SOTA)数值天气预报(NWP)系统相媲美。迄今为止,这些模型仍依赖传统的 NWP 生成的分析场作为输入,远非自主系统。虽然研究人员正在探索数据驱动的数据同化(DA)模式,以便为 LWMs 生成精确的初始场,但标准基准的缺乏妨碍了对不同数据驱动的 DA 算法进行公平评估。在此,我们介绍 DABench,这是一个利用ERA5数据作为地面实况的基准数据集,用于指导端到端数据驱动天气预报系统的开发。DABench 有四个标准特征:(1)在观测系统模拟实验方法指导下的稀疏和噪声模拟观测;(2)熟练的预训练天气预报模型,用于生成背景场,同时公平地评估同化结果对预测的影响;(3)标准化的评估指标,用于模型比较;(4)称为 DA Transformer (DaT)的强大基线。DaT 将四维变分 DA 先验知识整合到变换器模型中,在物理状态重建方面优于 SOTA,被命名为 4DVarNet。此外,我们还举例说明了通过将 DaT 与预测模型集成,开发端到端数据驱动天气预报系统的过程。研究人员可以利用 DABench 开发自己的模型,并与既定基线比较性能,这将有利于数据驱动天气预报系统的未来发展。代码可在此 Github 代码库中获取,数据集可在百度硬盘中获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Harnessing AI data-driven global weather models for climate attribution: An analysis of the 2017 Oroville Dam extreme atmospheric river Super Resolution On Global Weather Forecasts Can Transfer Learning be Used to Identify Tropical State-Dependent Bias Relevant to Midlatitude Subseasonal Predictability? Using Generative Models to Produce Realistic Populations of the United Kingdom Windstorms Integrated nowcasting of convective precipitation with Transformer-based models using multi-source data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1