Tianbao Zhang, Hongbin Wang, D. Niu, Chunlei Shi, Xisong Chen, Yulong Jin
{"title":"MMSTP: Multi-modal Spatiotemporal Feature Fusion Network for Precipitation Prediction","authors":"Tianbao Zhang, Hongbin Wang, D. Niu, Chunlei Shi, Xisong Chen, Yulong Jin","doi":"10.1109/ISAS59543.2023.10164452","DOIUrl":null,"url":null,"abstract":"Precipitation prediction, especially accurate rainstorm warning, is a fundamental research direction in preventing major natural disasters and is of great significance. Despite the emergence of numerous deep learning models in recent years, existing CNN-based methods often struggle to effectively extract global spatiotemporal features due to limitations in the convolution kernel, greatly reducing the model’s expressive power. Additionally, relying solely on radar echo maps as a single data source also limits the accuracy of short-term precipitation prediction. In this work, we introduce a multi-modal spatiotemporal feature fusion framework called MMSTP, which utilizes multi-modal data information from satellite images and radar echo maps. The encoder module of MMSTP is designed to combine the advantages of CNN and Transformer in local feature extraction and global information perception, respectively, and uses self-attention mechanism to model temporal features and perform multi-modal fusion. MMSTP is an end-to-end multi-modal, multi-scale, and multi-frame feature fusion framework that can significantly improve the accuracy of short-term precipitation prediction. It provides a new approach to spatiotemporal sequence forecasting problems. Based on our experimental results, MMSTP surpasses the state-of-the-art (SOTA) performance on benchmark datasets.","PeriodicalId":199115,"journal":{"name":"2023 6th International Symposium on Autonomous Systems (ISAS)","volume":" 10","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 6th International Symposium on Autonomous Systems (ISAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISAS59543.2023.10164452","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Precipitation prediction, especially accurate rainstorm warning, is a fundamental research direction in preventing major natural disasters and is of great significance. Despite the emergence of numerous deep learning models in recent years, existing CNN-based methods often struggle to effectively extract global spatiotemporal features due to limitations in the convolution kernel, greatly reducing the model’s expressive power. Additionally, relying solely on radar echo maps as a single data source also limits the accuracy of short-term precipitation prediction. In this work, we introduce a multi-modal spatiotemporal feature fusion framework called MMSTP, which utilizes multi-modal data information from satellite images and radar echo maps. The encoder module of MMSTP is designed to combine the advantages of CNN and Transformer in local feature extraction and global information perception, respectively, and uses self-attention mechanism to model temporal features and perform multi-modal fusion. MMSTP is an end-to-end multi-modal, multi-scale, and multi-frame feature fusion framework that can significantly improve the accuracy of short-term precipitation prediction. It provides a new approach to spatiotemporal sequence forecasting problems. Based on our experimental results, MMSTP surpasses the state-of-the-art (SOTA) performance on benchmark datasets.