从异常值数据中稳健估计协方差矩阵

IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE open journal of signal processing Pub Date : 2024-10-02 DOI:10.1109/OJSP.2024.3473610
Petre Stoica;Prabhu Babu;Piyush Varshney
{"title":"从异常值数据中稳健估计协方差矩阵","authors":"Petre Stoica;Prabhu Babu;Piyush Varshney","doi":"10.1109/OJSP.2024.3473610","DOIUrl":null,"url":null,"abstract":"The robust estimation of the covariance matrix is a frequent task in practical applications in which, more often than not, some data samples are outliers. There are several methods that can be used to robustly estimate a covariance matrix from corrupted data, a representative example of which is the \n<bold>m</b>\ninimum \n<bold>c</b>\novariance \n<bold>d</b>\neterminant (MCD) method. In this paper we present a maximum conditional likelihood interpretation of MCD that provides a new motivation of as well as further insights into this method. To perform at its best MCD requires information on the number of outliers in the data, which usually is not available. We propose two new methods for covariance matrix estimation from data with outliers that do not suffer from this problem: TEST (multiple-hypothesis \n<bold>test</b>\ning method) which uses the FDR (false discovery rate) to test a set of model hypotheses and hence estimate the number of outliers and their locations, and LIKE (penalized \n<bold>like</b>\nlihood method) that solves the outlier estimation problem using a GIC (generalized information criterion) to penalize the complexity of a high-dimensional data model. We show by means of numerical simulations that the performances of TEST and LIKE are relatively similar to one another as well as to the performance of the oracle MCD (which uses the true number of outliers) and significantly better than the performance of MCD that uses an upper bound on the outlier number.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"1061-1072"},"PeriodicalIF":2.9000,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10704043","citationCount":"0","resultStr":"{\"title\":\"Robust Estimation of the Covariance Matrix From Data With Outliers\",\"authors\":\"Petre Stoica;Prabhu Babu;Piyush Varshney\",\"doi\":\"10.1109/OJSP.2024.3473610\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The robust estimation of the covariance matrix is a frequent task in practical applications in which, more often than not, some data samples are outliers. There are several methods that can be used to robustly estimate a covariance matrix from corrupted data, a representative example of which is the \\n<bold>m</b>\\ninimum \\n<bold>c</b>\\novariance \\n<bold>d</b>\\neterminant (MCD) method. In this paper we present a maximum conditional likelihood interpretation of MCD that provides a new motivation of as well as further insights into this method. To perform at its best MCD requires information on the number of outliers in the data, which usually is not available. We propose two new methods for covariance matrix estimation from data with outliers that do not suffer from this problem: TEST (multiple-hypothesis \\n<bold>test</b>\\ning method) which uses the FDR (false discovery rate) to test a set of model hypotheses and hence estimate the number of outliers and their locations, and LIKE (penalized \\n<bold>like</b>\\nlihood method) that solves the outlier estimation problem using a GIC (generalized information criterion) to penalize the complexity of a high-dimensional data model. We show by means of numerical simulations that the performances of TEST and LIKE are relatively similar to one another as well as to the performance of the oracle MCD (which uses the true number of outliers) and significantly better than the performance of MCD that uses an upper bound on the outlier number.\",\"PeriodicalId\":73300,\"journal\":{\"name\":\"IEEE open journal of signal processing\",\"volume\":\"5 \",\"pages\":\"1061-1072\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10704043\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE open journal of signal processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10704043/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of signal processing","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10704043/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

协方差矩阵的稳健估计是实际应用中的一项经常性任务,因为在实际应用中,一些数据样本往往是异常值。有几种方法可以用来从损坏的数据中稳健地估计协方差矩阵,其中一个代表性的例子就是最小协方差行列式(MCD)方法。在本文中,我们提出了 MCD 的最大条件似然解释,为这种方法提供了新的动机和进一步的见解。要使 MCD 达到最佳效果,需要获得数据中离群值的数量信息,而这通常是无法获得的。我们提出了两种新方法,用于从有异常值的数据中估计协方差矩阵,它们都不存在这个问题:TEST(多重假设检验方法)使用 FDR(错误发现率)来检验一组模型假设,从而估计异常值的数量及其位置;LIKE(惩罚似然法)使用 GIC(广义信息准则)来解决异常值估计问题,以惩罚高维数据模型的复杂性。我们通过数值模拟表明,TEST 和 LIKE 的性能彼此比较接近,也与神谕 MCD(使用离群值的真实数量)的性能比较接近,而且明显优于使用离群值数量上限的 MCD 的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Robust Estimation of the Covariance Matrix From Data With Outliers
The robust estimation of the covariance matrix is a frequent task in practical applications in which, more often than not, some data samples are outliers. There are several methods that can be used to robustly estimate a covariance matrix from corrupted data, a representative example of which is the m inimum c ovariance d eterminant (MCD) method. In this paper we present a maximum conditional likelihood interpretation of MCD that provides a new motivation of as well as further insights into this method. To perform at its best MCD requires information on the number of outliers in the data, which usually is not available. We propose two new methods for covariance matrix estimation from data with outliers that do not suffer from this problem: TEST (multiple-hypothesis test ing method) which uses the FDR (false discovery rate) to test a set of model hypotheses and hence estimate the number of outliers and their locations, and LIKE (penalized like lihood method) that solves the outlier estimation problem using a GIC (generalized information criterion) to penalize the complexity of a high-dimensional data model. We show by means of numerical simulations that the performances of TEST and LIKE are relatively similar to one another as well as to the performance of the oracle MCD (which uses the true number of outliers) and significantly better than the performance of MCD that uses an upper bound on the outlier number.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.30
自引率
0.00%
发文量
0
审稿时长
22 weeks
期刊最新文献
Robust Estimation of the Covariance Matrix From Data With Outliers Dynamic Sensor Placement Based on Sampling Theory for Graph Signals Adversarial Training for Jamming-Robust Channel Estimation in OFDM Systems Track Coalescence and Repulsion in Multitarget Tracking: An Analysis of MHT, JPDA, and Belief Propagation Methods Impact of Varying Distance-Based Fingerprint Similarity Metrics on Affinity Propagation Clustering Performance in Received Signal Strength-Based Fingerprint Databases
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1