Cellwise outlier detection in heterogeneous populations

Giorgia Zaccaria, Luis A. García-Escudero, Francesca Greselin, Agustín Mayo-Íscar
{"title":"Cellwise outlier detection in heterogeneous populations","authors":"Giorgia Zaccaria, Luis A. García-Escudero, Francesca Greselin, Agustín Mayo-Íscar","doi":"arxiv-2409.07881","DOIUrl":null,"url":null,"abstract":"Real-world applications may be affected by outlying values. In the\nmodel-based clustering literature, several methodologies have been proposed to\ndetect units that deviate from the majority of the data (rowwise outliers) and\ntrim them from the parameter estimates. However, the discarded observations can\nencompass valuable information in some observed features. Following the more\nrecent cellwise contamination paradigm, we introduce a Gaussian mixture model\nfor cellwise outlier detection. The proposal is estimated via an\nExpectation-Maximization (EM) algorithm with an additional step for flagging\nthe contaminated cells of a data matrix and then imputing -- instead of\ndiscarding -- them before the parameter estimation. This procedure adheres to\nthe spirit of the EM algorithm by treating the contaminated cells as missing\nvalues. We analyze the performance of the proposed model in comparison with\nother existing methodologies through a simulation study with different\nscenarios and illustrate its potential use for clustering, outlier detection,\nand imputation on three real data sets.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"42 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07881","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Real-world applications may be affected by outlying values. In the model-based clustering literature, several methodologies have been proposed to detect units that deviate from the majority of the data (rowwise outliers) and trim them from the parameter estimates. However, the discarded observations can encompass valuable information in some observed features. Following the more recent cellwise contamination paradigm, we introduce a Gaussian mixture model for cellwise outlier detection. The proposal is estimated via an Expectation-Maximization (EM) algorithm with an additional step for flagging the contaminated cells of a data matrix and then imputing -- instead of discarding -- them before the parameter estimation. This procedure adheres to the spirit of the EM algorithm by treating the contaminated cells as missing values. We analyze the performance of the proposed model in comparison with other existing methodologies through a simulation study with different scenarios and illustrate its potential use for clustering, outlier detection, and imputation on three real data sets.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
异质群体中的细胞离群点检测
现实世界的应用可能会受到离群值的影响。在基于模型的聚类文献中,已经提出了几种方法来检测偏离大多数数据的单元(纵向离群值),并将其从参数估计中删除。然而,这些被丢弃的观测数据可能包含了某些观测特征的有价值信息。根据最近的单元污染范例,我们引入了一种高斯混合物模型用于单元离群值检测。该建议通过期望最大化(EM)算法进行估计,并在参数估计前增加了一个步骤,即标记数据矩阵中受污染的单元,然后将其归入(而不是丢弃)。这一过程秉承了 EM 算法的精神,将受污染的单元格视为缺失值。我们通过对不同情况的模拟研究,分析了所提模型与其他现有方法的性能比较,并在三个真实数据集上说明了该模型在聚类、离群点检测和估算方面的潜在用途。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Poisson approximate likelihood compared to the particle filter Optimising the Trade-Off Between Type I and Type II Errors: A Review and Extensions Bias Reduction in Matched Observational Studies with Continuous Treatments: Calipered Non-Bipartite Matching and Bias-Corrected Estimation and Inference Forecasting age distribution of life-table death counts via α-transformation Probability-scale residuals for event-time data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1