Daewon Yang, Taeryon Choi, Eric Lavigne, Yeonseung Chung
{"title":"非参数贝叶斯协变量相关多变量函数聚类:多空气污染物时间序列数据的应用","authors":"Daewon Yang, Taeryon Choi, Eric Lavigne, Yeonseung Chung","doi":"10.1111/rssc.12589","DOIUrl":null,"url":null,"abstract":"<p>Air pollution is a major threat to public health. Understanding the spatial distribution of air pollution concentration is of great interest to government or local authorities, as it informs about target areas for implementing policies for air quality management. Cluster analysis has been popularly used to identify groups of locations with similar profiles of average levels of multiple air pollutants, efficiently summarising the spatial pattern. This study aimed to cluster locations based on the seasonal patterns of multiple air pollutants incorporating the location-specific characteristics such as socio-economic indicators. For this purpose, we proposed a novel non-parametric Bayesian sparse latent factor model for covariate-dependent multivariate functional clustering. Furthermore, we extend this model to conduct clustering with temporal dependency. The proposed methods are illustrated through a simulation study and applied to time-series data for daily mean concentrations of ozone (<math>\n <semantics>\n <mrow>\n <msub>\n <mrow>\n <mi>O</mi>\n </mrow>\n <mrow>\n <mn>3</mn>\n </mrow>\n </msub>\n </mrow>\n <annotation>$$ {\\mathrm{O}}_3 $$</annotation>\n </semantics></math>), nitrogen dioxide (<math>\n <semantics>\n <mrow>\n <mi>N</mi>\n <msub>\n <mrow>\n <mi>O</mi>\n </mrow>\n <mrow>\n <mn>2</mn>\n </mrow>\n </msub>\n </mrow>\n <annotation>$$ \\mathrm{N}{\\mathrm{O}}_2 $$</annotation>\n </semantics></math>), and fine particulate matter (<math>\n <semantics>\n <mrow>\n <mi>P</mi>\n <msub>\n <mrow>\n <mi>M</mi>\n </mrow>\n <mrow>\n <mn>2</mn>\n <mo>.</mo>\n <mn>5</mn>\n </mrow>\n </msub>\n </mrow>\n <annotation>$$ \\mathrm{P}{\\mathrm{M}}_{2.5} $$</annotation>\n </semantics></math>) collected for 25 cities in Canada in 1986–2015.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"71 5","pages":"1521-1542"},"PeriodicalIF":1.0000,"publicationDate":"2022-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Non-parametric Bayesian covariate-dependent multivariate functional clustering: An application to time-series data for multiple air pollutants\",\"authors\":\"Daewon Yang, Taeryon Choi, Eric Lavigne, Yeonseung Chung\",\"doi\":\"10.1111/rssc.12589\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Air pollution is a major threat to public health. Understanding the spatial distribution of air pollution concentration is of great interest to government or local authorities, as it informs about target areas for implementing policies for air quality management. Cluster analysis has been popularly used to identify groups of locations with similar profiles of average levels of multiple air pollutants, efficiently summarising the spatial pattern. This study aimed to cluster locations based on the seasonal patterns of multiple air pollutants incorporating the location-specific characteristics such as socio-economic indicators. For this purpose, we proposed a novel non-parametric Bayesian sparse latent factor model for covariate-dependent multivariate functional clustering. Furthermore, we extend this model to conduct clustering with temporal dependency. The proposed methods are illustrated through a simulation study and applied to time-series data for daily mean concentrations of ozone (<math>\\n <semantics>\\n <mrow>\\n <msub>\\n <mrow>\\n <mi>O</mi>\\n </mrow>\\n <mrow>\\n <mn>3</mn>\\n </mrow>\\n </msub>\\n </mrow>\\n <annotation>$$ {\\\\mathrm{O}}_3 $$</annotation>\\n </semantics></math>), nitrogen dioxide (<math>\\n <semantics>\\n <mrow>\\n <mi>N</mi>\\n <msub>\\n <mrow>\\n <mi>O</mi>\\n </mrow>\\n <mrow>\\n <mn>2</mn>\\n </mrow>\\n </msub>\\n </mrow>\\n <annotation>$$ \\\\mathrm{N}{\\\\mathrm{O}}_2 $$</annotation>\\n </semantics></math>), and fine particulate matter (<math>\\n <semantics>\\n <mrow>\\n <mi>P</mi>\\n <msub>\\n <mrow>\\n <mi>M</mi>\\n </mrow>\\n <mrow>\\n <mn>2</mn>\\n <mo>.</mo>\\n <mn>5</mn>\\n </mrow>\\n </msub>\\n </mrow>\\n <annotation>$$ \\\\mathrm{P}{\\\\mathrm{M}}_{2.5} $$</annotation>\\n </semantics></math>) collected for 25 cities in Canada in 1986–2015.</p>\",\"PeriodicalId\":49981,\"journal\":{\"name\":\"Journal of the Royal Statistical Society Series C-Applied Statistics\",\"volume\":\"71 5\",\"pages\":\"1521-1542\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2022-08-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Royal Statistical Society Series C-Applied Statistics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/rssc.12589\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Royal Statistical Society Series C-Applied Statistics","FirstCategoryId":"100","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/rssc.12589","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
Non-parametric Bayesian covariate-dependent multivariate functional clustering: An application to time-series data for multiple air pollutants
Air pollution is a major threat to public health. Understanding the spatial distribution of air pollution concentration is of great interest to government or local authorities, as it informs about target areas for implementing policies for air quality management. Cluster analysis has been popularly used to identify groups of locations with similar profiles of average levels of multiple air pollutants, efficiently summarising the spatial pattern. This study aimed to cluster locations based on the seasonal patterns of multiple air pollutants incorporating the location-specific characteristics such as socio-economic indicators. For this purpose, we proposed a novel non-parametric Bayesian sparse latent factor model for covariate-dependent multivariate functional clustering. Furthermore, we extend this model to conduct clustering with temporal dependency. The proposed methods are illustrated through a simulation study and applied to time-series data for daily mean concentrations of ozone (), nitrogen dioxide (), and fine particulate matter () collected for 25 cities in Canada in 1986–2015.
期刊介绍:
The Journal of the Royal Statistical Society, Series C (Applied Statistics) is a journal of international repute for statisticians both inside and outside the academic world. The journal is concerned with papers which deal with novel solutions to real life statistical problems by adapting or developing methodology, or by demonstrating the proper application of new or existing statistical methods to them. At their heart therefore the papers in the journal are motivated by examples and statistical data of all kinds. The subject-matter covers the whole range of inter-disciplinary fields, e.g. applications in agriculture, genetics, industry, medicine and the physical sciences, and papers on design issues (e.g. in relation to experiments, surveys or observational studies).
A deep understanding of statistical methodology is not necessary to appreciate the content. Although papers describing developments in statistical computing driven by practical examples are within its scope, the journal is not concerned with simply numerical illustrations or simulation studies. The emphasis of Series C is on case-studies of statistical analyses in practice.