{"title":"Geodesic PCA in the Wasserstein space by Convex PCA","authors":"Jérémie Bigot, R. Gouet, T. Klein, Alfredo López","doi":"10.1214/15-AIHP706","DOIUrl":null,"url":null,"abstract":"We introduce the method of Geodesic Principal Component Analysis (GPCA) on the space of probability measures on the line, with finite second moment, endowed with the Wasserstein metric. We discuss the advantages of this approach, over a standard functional PCA of probability densities in the Hilbert space of square-integrable functions. We establish the consistency of the method by showing that the empirical GPCA converges to its population counterpart, as the sample size tends to infinity. A key property in the study of GPCA is the isometry between the Wasserstein space and a closed convex subset of the space of square-integrable functions, with respect to an appropriate measure. Therefore, we consider the general problem of PCA in a closed convex subset of a separable Hilbert space, which serves as basis for the analysis of GPCA and also has interest in its own right. We provide illustrative examples on simple statistical models, to show the benefits of this approach for data analysis. The method is also applied to a real dataset of population pyramids.","PeriodicalId":7902,"journal":{"name":"Annales De L Institut Henri Poincare-probabilites Et Statistiques","volume":"7 1","pages":"1-26"},"PeriodicalIF":1.2000,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"104","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annales De L Institut Henri Poincare-probabilites Et Statistiques","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/15-AIHP706","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 104
Abstract
We introduce the method of Geodesic Principal Component Analysis (GPCA) on the space of probability measures on the line, with finite second moment, endowed with the Wasserstein metric. We discuss the advantages of this approach, over a standard functional PCA of probability densities in the Hilbert space of square-integrable functions. We establish the consistency of the method by showing that the empirical GPCA converges to its population counterpart, as the sample size tends to infinity. A key property in the study of GPCA is the isometry between the Wasserstein space and a closed convex subset of the space of square-integrable functions, with respect to an appropriate measure. Therefore, we consider the general problem of PCA in a closed convex subset of a separable Hilbert space, which serves as basis for the analysis of GPCA and also has interest in its own right. We provide illustrative examples on simple statistical models, to show the benefits of this approach for data analysis. The method is also applied to a real dataset of population pyramids.
期刊介绍:
The Probability and Statistics section of the Annales de l’Institut Henri Poincaré is an international journal which publishes high quality research papers. The journal deals with all aspects of modern probability theory and mathematical statistics, as well as with their applications.