Wuxin Wang, Weicheng Ni, Tao Han, Lei Bai, Boheng Duan, Kaijun Ren
{"title":"DABench: A Benchmark Dataset for Data-Driven Weather Data Assimilation","authors":"Wuxin Wang, Weicheng Ni, Tao Han, Lei Bai, Boheng Duan, Kaijun Ren","doi":"arxiv-2408.11438","DOIUrl":null,"url":null,"abstract":"Recent advancements in deep learning (DL) have led to the development of\nseveral Large Weather Models (LWMs) that rival state-of-the-art (SOTA)\nnumerical weather prediction (NWP) systems. Up to now, these models still rely\non traditional NWP-generated analysis fields as input and are far from being an\nautonomous system. While researchers are exploring data-driven data\nassimilation (DA) models to generate accurate initial fields for LWMs, the lack\nof a standard benchmark impedes the fair evaluation among different data-driven\nDA algorithms. Here, we introduce DABench, a benchmark dataset utilizing ERA5\ndata as ground truth to guide the development of end-to-end data-driven weather\nprediction systems. DABench contributes four standard features: (1) sparse and\nnoisy simulated observations under the guidance of the observing system\nsimulation experiment method; (2) a skillful pre-trained weather prediction\nmodel to generate background fields while fairly evaluating the impact of\nassimilation outcomes on predictions; (3) standardized evaluation metrics for\nmodel comparison; (4) a strong baseline called the DA Transformer (DaT). DaT\nintegrates the four-dimensional variational DA prior knowledge into the\nTransformer model and outperforms the SOTA in physical state reconstruction,\nnamed 4DVarNet. Furthermore, we exemplify the development of an end-to-end\ndata-driven weather prediction system by integrating DaT with the prediction\nmodel. Researchers can leverage DABench to develop their models and compare\nperformance against established baselines, which will benefit the future\nadvancements of data-driven weather prediction systems. The code is available\non this Github repository and the dataset is available at the Baidu Drive.","PeriodicalId":501166,"journal":{"name":"arXiv - PHYS - Atmospheric and Oceanic Physics","volume":"180 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Atmospheric and Oceanic Physics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.11438","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Recent advancements in deep learning (DL) have led to the development of
several Large Weather Models (LWMs) that rival state-of-the-art (SOTA)
numerical weather prediction (NWP) systems. Up to now, these models still rely
on traditional NWP-generated analysis fields as input and are far from being an
autonomous system. While researchers are exploring data-driven data
assimilation (DA) models to generate accurate initial fields for LWMs, the lack
of a standard benchmark impedes the fair evaluation among different data-driven
DA algorithms. Here, we introduce DABench, a benchmark dataset utilizing ERA5
data as ground truth to guide the development of end-to-end data-driven weather
prediction systems. DABench contributes four standard features: (1) sparse and
noisy simulated observations under the guidance of the observing system
simulation experiment method; (2) a skillful pre-trained weather prediction
model to generate background fields while fairly evaluating the impact of
assimilation outcomes on predictions; (3) standardized evaluation metrics for
model comparison; (4) a strong baseline called the DA Transformer (DaT). DaT
integrates the four-dimensional variational DA prior knowledge into the
Transformer model and outperforms the SOTA in physical state reconstruction,
named 4DVarNet. Furthermore, we exemplify the development of an end-to-end
data-driven weather prediction system by integrating DaT with the prediction
model. Researchers can leverage DABench to develop their models and compare
performance against established baselines, which will benefit the future
advancements of data-driven weather prediction systems. The code is available
on this Github repository and the dataset is available at the Baidu Drive.