Yangwen Yu;Victor O. K. Li;Jacqueline C. K. Lam;Kelvin Chan
{"title":"GCN-ST-MDIR:基于图卷积网络的时空缺失空气污染数据模式识别与恢复","authors":"Yangwen Yu;Victor O. K. Li;Jacqueline C. K. Lam;Kelvin Chan","doi":"10.1109/TBDATA.2023.3277710","DOIUrl":null,"url":null,"abstract":"Missing data pattern identification and recovery (MDIR) is vital for accurate air pollution monitoring. To recover the missing air pollution data, GCN-ST-MDIR, a Graph Convolutional Network (GCN)-based MDIR framework, is proposed to identify daily missing data patterns and automatically select the best recovery method. GCN-ST-MDIR presents four novelties: (1) A new graph construction is developed to improve GCN data representation for MDIR using S-T similarity matrix and domain-specific knowledge (e.g., weekend/weekday). (2) A TL component is used to pre-train LSCE and ILSCE models. (3) A GCN structure outputs a selection indicator to determine the dominant missing pattern for daily input. The pre-trained data recovery model's accuracy is incorporated into the GCN loss function to penalize the wrong indicator. (4) The output of the GCN structure is used as a score to combine LSCE and ILSCE. Results show that the domain-specific S-T regularity and irregularity can be used as the prior information for both GCN and ILSCE/LSCE to enhance feature extraction. Our model considerably improves the recovery performance as compared to the baselines. GCN-ST-MDIR has achieved an accuracy of 88.48% for general missing data recovery with consecutively and sporadically missing data. GCN-ST-MDIR can be extended to many other S-T MDIR challenges.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 5","pages":"1347-1364"},"PeriodicalIF":7.5000,"publicationDate":"2023-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GCN-ST-MDIR: Graph Convolutional Network-Based Spatial-Temporal Missing Air Pollution Data Pattern Identification and Recovery\",\"authors\":\"Yangwen Yu;Victor O. K. Li;Jacqueline C. K. Lam;Kelvin Chan\",\"doi\":\"10.1109/TBDATA.2023.3277710\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Missing data pattern identification and recovery (MDIR) is vital for accurate air pollution monitoring. To recover the missing air pollution data, GCN-ST-MDIR, a Graph Convolutional Network (GCN)-based MDIR framework, is proposed to identify daily missing data patterns and automatically select the best recovery method. GCN-ST-MDIR presents four novelties: (1) A new graph construction is developed to improve GCN data representation for MDIR using S-T similarity matrix and domain-specific knowledge (e.g., weekend/weekday). (2) A TL component is used to pre-train LSCE and ILSCE models. (3) A GCN structure outputs a selection indicator to determine the dominant missing pattern for daily input. The pre-trained data recovery model's accuracy is incorporated into the GCN loss function to penalize the wrong indicator. (4) The output of the GCN structure is used as a score to combine LSCE and ILSCE. Results show that the domain-specific S-T regularity and irregularity can be used as the prior information for both GCN and ILSCE/LSCE to enhance feature extraction. Our model considerably improves the recovery performance as compared to the baselines. GCN-ST-MDIR has achieved an accuracy of 88.48% for general missing data recovery with consecutively and sporadically missing data. GCN-ST-MDIR can be extended to many other S-T MDIR challenges.\",\"PeriodicalId\":13106,\"journal\":{\"name\":\"IEEE Transactions on Big Data\",\"volume\":\"9 5\",\"pages\":\"1347-1364\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2023-03-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Big Data\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10129038/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10129038/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
摘要
缺失数据模式识别和恢复(MDIR)对于准确监测空气污染至关重要。为了恢复缺失的空气污染数据,提出了基于图卷积网络(GCN)的MDIR框架GCN- st -MDIR,用于识别每日缺失的数据模式并自动选择最佳恢复方法。GCN- st -MDIR提出了四个新颖之处:(1)利用S-T相似矩阵和领域特定知识(例如,周末/工作日),开发了一种新的图结构,以改进MDIR的GCN数据表示。(2)使用TL组件对LSCE和ILSCE模型进行预训练。(3) GCN结构输出一个选择指标,以确定日常输入的主要缺失模式。将预训练的数据恢复模型的准确性纳入GCN损失函数中,对错误指标进行惩罚。(4)以GCN结构的输出作为评分,将LSCE和ILSCE结合起来。结果表明,域特有的S-T正则性和不规则性可以作为GCN和ILSCE/LSCE的先验信息,以增强特征提取。与基线相比,我们的模型大大提高了恢复性能。GCN-ST-MDIR对于连续和零星缺失数据的一般缺失数据恢复准确率达到了88.48%。GCN-ST-MDIR可以扩展到许多其他S-T MDIR挑战。
GCN-ST-MDIR: Graph Convolutional Network-Based Spatial-Temporal Missing Air Pollution Data Pattern Identification and Recovery
Missing data pattern identification and recovery (MDIR) is vital for accurate air pollution monitoring. To recover the missing air pollution data, GCN-ST-MDIR, a Graph Convolutional Network (GCN)-based MDIR framework, is proposed to identify daily missing data patterns and automatically select the best recovery method. GCN-ST-MDIR presents four novelties: (1) A new graph construction is developed to improve GCN data representation for MDIR using S-T similarity matrix and domain-specific knowledge (e.g., weekend/weekday). (2) A TL component is used to pre-train LSCE and ILSCE models. (3) A GCN structure outputs a selection indicator to determine the dominant missing pattern for daily input. The pre-trained data recovery model's accuracy is incorporated into the GCN loss function to penalize the wrong indicator. (4) The output of the GCN structure is used as a score to combine LSCE and ILSCE. Results show that the domain-specific S-T regularity and irregularity can be used as the prior information for both GCN and ILSCE/LSCE to enhance feature extraction. Our model considerably improves the recovery performance as compared to the baselines. GCN-ST-MDIR has achieved an accuracy of 88.48% for general missing data recovery with consecutively and sporadically missing data. GCN-ST-MDIR can be extended to many other S-T MDIR challenges.
期刊介绍:
The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.