LiangQi Zhou, Hongzhen Xu, Li Wei, Quan Zhang, Fei Zhou, Zhuo-Dai Li
{"title":"基于无限高斯贝叶斯和CNN的大气大数据离群点检测","authors":"LiangQi Zhou, Hongzhen Xu, Li Wei, Quan Zhang, Fei Zhou, Zhuo-Dai Li","doi":"10.1145/3318299.3318384","DOIUrl":null,"url":null,"abstract":"Air quality has always been a hot issue of concern to the people, the environmental protection department and the government. Among the massive air quality data, abnormal data can interfere with subsequent experiments and analysis. Therefore, it is necessary to detect abnormal data to improve the accuracy of the data. However, traditional air outlier detection methods require at least one year's data to make inferences about air quality. This paper firstly analyzes the characteristics of air quality big data, and then proposes a framework based on Bayesian non-parametric clustering, namely Dirichlet Process (DP) clustering framework, to realize the outlier detection of air quality. The framework optimizes Gaussian mixture model into infinite Gaussian mixture model according to the results of data analysis, and uses neural network to cluster the data processed by infinite Gaussian mixture model, which effectively improves the clustering accuracy and avoids the need of collecting a large number of training data.","PeriodicalId":164987,"journal":{"name":"International Conference on Machine Learning and Computing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Air Big Data Outlier Detection Based on Infinite Gauss Bayesian and CNN\",\"authors\":\"LiangQi Zhou, Hongzhen Xu, Li Wei, Quan Zhang, Fei Zhou, Zhuo-Dai Li\",\"doi\":\"10.1145/3318299.3318384\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Air quality has always been a hot issue of concern to the people, the environmental protection department and the government. Among the massive air quality data, abnormal data can interfere with subsequent experiments and analysis. Therefore, it is necessary to detect abnormal data to improve the accuracy of the data. However, traditional air outlier detection methods require at least one year's data to make inferences about air quality. This paper firstly analyzes the characteristics of air quality big data, and then proposes a framework based on Bayesian non-parametric clustering, namely Dirichlet Process (DP) clustering framework, to realize the outlier detection of air quality. The framework optimizes Gaussian mixture model into infinite Gaussian mixture model according to the results of data analysis, and uses neural network to cluster the data processed by infinite Gaussian mixture model, which effectively improves the clustering accuracy and avoids the need of collecting a large number of training data.\",\"PeriodicalId\":164987,\"journal\":{\"name\":\"International Conference on Machine Learning and Computing\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-02-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Machine Learning and Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3318299.3318384\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Machine Learning and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3318299.3318384","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Air Big Data Outlier Detection Based on Infinite Gauss Bayesian and CNN
Air quality has always been a hot issue of concern to the people, the environmental protection department and the government. Among the massive air quality data, abnormal data can interfere with subsequent experiments and analysis. Therefore, it is necessary to detect abnormal data to improve the accuracy of the data. However, traditional air outlier detection methods require at least one year's data to make inferences about air quality. This paper firstly analyzes the characteristics of air quality big data, and then proposes a framework based on Bayesian non-parametric clustering, namely Dirichlet Process (DP) clustering framework, to realize the outlier detection of air quality. The framework optimizes Gaussian mixture model into infinite Gaussian mixture model according to the results of data analysis, and uses neural network to cluster the data processed by infinite Gaussian mixture model, which effectively improves the clustering accuracy and avoids the need of collecting a large number of training data.