{"title":"A GLRT for estimating the number of correlated components in sample-poor mCCA","authors":"Tanuj Hasija, Tim Marrinan","doi":"10.23919/eusipco55093.2022.9909641","DOIUrl":null,"url":null,"abstract":"In many applications, components correlated across multiple data sets represent meaningful patterns and commonalities. Estimates of these patterns can be improved when the number of correlated components is known, but since data exploration often occurs in an unsupervised setting, the number of correlated components is generally not known. In this paper, we derive a generalized likelihood ratio test (GLRT) for estimating the number of components correlated across multiple data sets. In particular, we are concerned with the scenario where the number of available samples is small. As a result of the small sample support, correlation coefficients and other summary statistics are significantly overestimated by traditional methods. The proposed test combines linear dimensionality reduction with a GLRT based on a measure of multiset correlation referred as the generalized variance cost function (mCCA-GENVAR). By jointly estimating the rank of the dimensionality reduction and the number of correlated components, we are able to provide high-accuracy estimates in the challenging sample-poor setting. These advantages are illustrated in numerical experiments that compare and contrast the proposed method with existing techniques.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 30th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/eusipco55093.2022.9909641","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In many applications, components correlated across multiple data sets represent meaningful patterns and commonalities. Estimates of these patterns can be improved when the number of correlated components is known, but since data exploration often occurs in an unsupervised setting, the number of correlated components is generally not known. In this paper, we derive a generalized likelihood ratio test (GLRT) for estimating the number of components correlated across multiple data sets. In particular, we are concerned with the scenario where the number of available samples is small. As a result of the small sample support, correlation coefficients and other summary statistics are significantly overestimated by traditional methods. The proposed test combines linear dimensionality reduction with a GLRT based on a measure of multiset correlation referred as the generalized variance cost function (mCCA-GENVAR). By jointly estimating the rank of the dimensionality reduction and the number of correlated components, we are able to provide high-accuracy estimates in the challenging sample-poor setting. These advantages are illustrated in numerical experiments that compare and contrast the proposed method with existing techniques.