{"title":"从异常值数据中稳健估计协方差矩阵","authors":"Petre Stoica;Prabhu Babu;Piyush Varshney","doi":"10.1109/OJSP.2024.3473610","DOIUrl":null,"url":null,"abstract":"The robust estimation of the covariance matrix is a frequent task in practical applications in which, more often than not, some data samples are outliers. There are several methods that can be used to robustly estimate a covariance matrix from corrupted data, a representative example of which is the \n<bold>m</b>\ninimum \n<bold>c</b>\novariance \n<bold>d</b>\neterminant (MCD) method. In this paper we present a maximum conditional likelihood interpretation of MCD that provides a new motivation of as well as further insights into this method. To perform at its best MCD requires information on the number of outliers in the data, which usually is not available. We propose two new methods for covariance matrix estimation from data with outliers that do not suffer from this problem: TEST (multiple-hypothesis \n<bold>test</b>\ning method) which uses the FDR (false discovery rate) to test a set of model hypotheses and hence estimate the number of outliers and their locations, and LIKE (penalized \n<bold>like</b>\nlihood method) that solves the outlier estimation problem using a GIC (generalized information criterion) to penalize the complexity of a high-dimensional data model. We show by means of numerical simulations that the performances of TEST and LIKE are relatively similar to one another as well as to the performance of the oracle MCD (which uses the true number of outliers) and significantly better than the performance of MCD that uses an upper bound on the outlier number.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"1061-1072"},"PeriodicalIF":2.9000,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10704043","citationCount":"0","resultStr":"{\"title\":\"Robust Estimation of the Covariance Matrix From Data With Outliers\",\"authors\":\"Petre Stoica;Prabhu Babu;Piyush Varshney\",\"doi\":\"10.1109/OJSP.2024.3473610\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The robust estimation of the covariance matrix is a frequent task in practical applications in which, more often than not, some data samples are outliers. There are several methods that can be used to robustly estimate a covariance matrix from corrupted data, a representative example of which is the \\n<bold>m</b>\\ninimum \\n<bold>c</b>\\novariance \\n<bold>d</b>\\neterminant (MCD) method. In this paper we present a maximum conditional likelihood interpretation of MCD that provides a new motivation of as well as further insights into this method. To perform at its best MCD requires information on the number of outliers in the data, which usually is not available. We propose two new methods for covariance matrix estimation from data with outliers that do not suffer from this problem: TEST (multiple-hypothesis \\n<bold>test</b>\\ning method) which uses the FDR (false discovery rate) to test a set of model hypotheses and hence estimate the number of outliers and their locations, and LIKE (penalized \\n<bold>like</b>\\nlihood method) that solves the outlier estimation problem using a GIC (generalized information criterion) to penalize the complexity of a high-dimensional data model. We show by means of numerical simulations that the performances of TEST and LIKE are relatively similar to one another as well as to the performance of the oracle MCD (which uses the true number of outliers) and significantly better than the performance of MCD that uses an upper bound on the outlier number.\",\"PeriodicalId\":73300,\"journal\":{\"name\":\"IEEE open journal of signal processing\",\"volume\":\"5 \",\"pages\":\"1061-1072\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10704043\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE open journal of signal processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10704043/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of signal processing","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10704043/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Robust Estimation of the Covariance Matrix From Data With Outliers
The robust estimation of the covariance matrix is a frequent task in practical applications in which, more often than not, some data samples are outliers. There are several methods that can be used to robustly estimate a covariance matrix from corrupted data, a representative example of which is the
m
inimum
c
ovariance
d
eterminant (MCD) method. In this paper we present a maximum conditional likelihood interpretation of MCD that provides a new motivation of as well as further insights into this method. To perform at its best MCD requires information on the number of outliers in the data, which usually is not available. We propose two new methods for covariance matrix estimation from data with outliers that do not suffer from this problem: TEST (multiple-hypothesis
test
ing method) which uses the FDR (false discovery rate) to test a set of model hypotheses and hence estimate the number of outliers and their locations, and LIKE (penalized
like
lihood method) that solves the outlier estimation problem using a GIC (generalized information criterion) to penalize the complexity of a high-dimensional data model. We show by means of numerical simulations that the performances of TEST and LIKE are relatively similar to one another as well as to the performance of the oracle MCD (which uses the true number of outliers) and significantly better than the performance of MCD that uses an upper bound on the outlier number.