{"title":"DGMI: A diffusion-based generative adversarial framework for multivariate air quality imputation","authors":"Nuo Cheng, Qingjian Ni","doi":"10.1007/s10489-025-06240-8","DOIUrl":null,"url":null,"abstract":"<div><p>In the process of monitoring spatiotemporal air quality data, data sample missingness is prevalent, thus rectifying missing values in spatiotemporal data holds paramount significance. In recent years, diffusion probability models have played a prominent role in image, video, and text generation, and have also begun to be applied in the field of spatiotemporal data imputation. However, such models face challenges in extracting fine-grained features for stable model operation and accurate modeling of data probability distributions. To address the aforementioned issues, we propose a Diffusion-based Generative adversarial framework for Multivariate air quality data Imputation, termed DGMI. Recognizing the similar temporal, sensor, and indicator change characteristics inherent in air quality data, our framework is designed to cater to the spatiotemporal characteristics of air quality data by incorporating a multi-cycle temporal feature extraction module and a sensor indicator feature extraction module, facilitating multidimensional refinement and integration of temporal, sensor, and indicator information. Moreover, the initial missing value is encoded with linear interpolation and sine-cosine functions. Following the generation of imputed values by the model, we introduce a discriminator module to discern the consistency between imputed values and observed values to provide feedback for optimizing the model from a data distribution perspective. DGMI outperforms most current data imputation methods under various missing ratios in two real air quality datasets by 4.1% (root mean square error) and 3.0% (mean absolute error), exhibiting efficacy in scenarios characterized by multidimensional spatiotemporal and high missing rates data.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 4","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06240-8","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In the process of monitoring spatiotemporal air quality data, data sample missingness is prevalent, thus rectifying missing values in spatiotemporal data holds paramount significance. In recent years, diffusion probability models have played a prominent role in image, video, and text generation, and have also begun to be applied in the field of spatiotemporal data imputation. However, such models face challenges in extracting fine-grained features for stable model operation and accurate modeling of data probability distributions. To address the aforementioned issues, we propose a Diffusion-based Generative adversarial framework for Multivariate air quality data Imputation, termed DGMI. Recognizing the similar temporal, sensor, and indicator change characteristics inherent in air quality data, our framework is designed to cater to the spatiotemporal characteristics of air quality data by incorporating a multi-cycle temporal feature extraction module and a sensor indicator feature extraction module, facilitating multidimensional refinement and integration of temporal, sensor, and indicator information. Moreover, the initial missing value is encoded with linear interpolation and sine-cosine functions. Following the generation of imputed values by the model, we introduce a discriminator module to discern the consistency between imputed values and observed values to provide feedback for optimizing the model from a data distribution perspective. DGMI outperforms most current data imputation methods under various missing ratios in two real air quality datasets by 4.1% (root mean square error) and 3.0% (mean absolute error), exhibiting efficacy in scenarios characterized by multidimensional spatiotemporal and high missing rates data.
期刊介绍:
With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance.
The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.