阈值对离群值检测的影响:多元数据分析中MCD和MRCD估计器的比较研究

Nafisat Yusuf, Bannister Jerry Zachary
{"title":"阈值对离群值检测的影响:多元数据分析中MCD和MRCD估计器的比较研究","authors":"Nafisat Yusuf, Bannister Jerry Zachary","doi":"10.9734/ajpas/2023/v25i2557","DOIUrl":null,"url":null,"abstract":"Aims: The aim of this study is to investigate the impact of thresholds on the detection of outliers by comparing the performance of two estimators, namely the minimum covariance determinant (MCD) and minimum regularized covariance determinant (MRCD), at different sample sizes. The study uses simulated data generated from the standard normal distribution to assess how varying thresholds affect the ability of these estimators to detect outliers.
 Study Design: This study employs a quantitative research design. It involves the generation of simulated data, the application of the MCD and MRCD estimators for outlier detection, and the systematic manipulation of thresholds and sample size as independent variables.
 Place and Duration: The study is conducted using computational tools and did not require a physical location.
 Methodology: Simulated data is generated from the standard normal distribution to create a controlled environment for outlier detection experiments. The MCD and MRCD estimators are applied to the simulated data to detect outliers. These estimators are sensitive to deviations from the norm in the data. Different thresholds are systematically applied to the data, and the performance of the estimators is assessed at each threshold level. Thresholds may vary in their extremeness. The study investigates the impact of different sample sizes on outlier detection. This involves using datasets with varying numbers of observations. The r programming language and associated packages are used as the statistical tool for data generation, analysis, and visualization.
 Results: The study's findings indicate that the choice of thresholds in data analysis significantly affects the performance of the MCD and MRCD estimators in outlier detection. If the thresholds used for both estimators are the same, their performance is similar. However, differences emerge when thresholds differ from each other. Higher thresholds are shown to identify less extreme outliers, while lower thresholds are effective at identifying more extreme outliers. These results provide insights into the behavior of these estimators in outlier detection scenarios, shedding light on their sensitivity to threshold choices and sample size.Conclusion: Our study has shed light on the critical interdependencies among threshold choices, sample sizes, and the performance of the minimum covariance determinant (MCD) and minimum regularized covariance determinant (MRCD) estimators in the context of outlier detection. By conducting a systematic exploration in a controlled environment with simulated data, we have gleaned valuable insights that can inform both researchers and practitioners in the field of organizational science research.","PeriodicalId":8532,"journal":{"name":"Asian Journal of Probability and Statistics","volume":"63 3","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Threshold Effects on Outlier Detection: A Comparative Study of MCD and MRCD Estimators in Multivariate Data Analysis\",\"authors\":\"Nafisat Yusuf, Bannister Jerry Zachary\",\"doi\":\"10.9734/ajpas/2023/v25i2557\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Aims: The aim of this study is to investigate the impact of thresholds on the detection of outliers by comparing the performance of two estimators, namely the minimum covariance determinant (MCD) and minimum regularized covariance determinant (MRCD), at different sample sizes. The study uses simulated data generated from the standard normal distribution to assess how varying thresholds affect the ability of these estimators to detect outliers.
 Study Design: This study employs a quantitative research design. It involves the generation of simulated data, the application of the MCD and MRCD estimators for outlier detection, and the systematic manipulation of thresholds and sample size as independent variables.
 Place and Duration: The study is conducted using computational tools and did not require a physical location.
 Methodology: Simulated data is generated from the standard normal distribution to create a controlled environment for outlier detection experiments. The MCD and MRCD estimators are applied to the simulated data to detect outliers. These estimators are sensitive to deviations from the norm in the data. Different thresholds are systematically applied to the data, and the performance of the estimators is assessed at each threshold level. Thresholds may vary in their extremeness. The study investigates the impact of different sample sizes on outlier detection. This involves using datasets with varying numbers of observations. The r programming language and associated packages are used as the statistical tool for data generation, analysis, and visualization.
 Results: The study's findings indicate that the choice of thresholds in data analysis significantly affects the performance of the MCD and MRCD estimators in outlier detection. If the thresholds used for both estimators are the same, their performance is similar. However, differences emerge when thresholds differ from each other. Higher thresholds are shown to identify less extreme outliers, while lower thresholds are effective at identifying more extreme outliers. These results provide insights into the behavior of these estimators in outlier detection scenarios, shedding light on their sensitivity to threshold choices and sample size.Conclusion: Our study has shed light on the critical interdependencies among threshold choices, sample sizes, and the performance of the minimum covariance determinant (MCD) and minimum regularized covariance determinant (MRCD) estimators in the context of outlier detection. By conducting a systematic exploration in a controlled environment with simulated data, we have gleaned valuable insights that can inform both researchers and practitioners in the field of organizational science research.\",\"PeriodicalId\":8532,\"journal\":{\"name\":\"Asian Journal of Probability and Statistics\",\"volume\":\"63 3\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Asian Journal of Probability and Statistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.9734/ajpas/2023/v25i2557\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Asian Journal of Probability and Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.9734/ajpas/2023/v25i2557","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

目的:本研究的目的是通过比较两种估计器,即最小协方差行列式(MCD)和最小正则化协方差行列式(MRCD)在不同样本量下的性能,研究阈值对异常值检测的影响。该研究使用标准正态分布生成的模拟数据来评估不同的阈值如何影响这些估计器检测异常值的能力。研究设计:本研究采用定量研究设计。它涉及模拟数据的生成,应用MCD和MRCD估计器进行离群值检测,以及系统地操纵阈值和样本量作为自变量。 地点和时间:本研究使用计算工具进行,不需要物理位置。 方法:从标准正态分布中生成模拟数据,为离群值检测实验创造受控环境。应用MCD和MRCD估计器对模拟数据进行异常值检测。这些估计器对数据中偏离规范的情况很敏感。系统地对数据应用不同的阈值,并在每个阈值水平上评估估计器的性能。阈值的极端程度可能有所不同。本研究探讨了不同样本量对异常值检测的影响。这涉及到使用具有不同数量观测值的数据集。r编程语言和相关软件包被用作数据生成、分析和可视化的统计工具。 结果:研究结果表明,数据分析中阈值的选择显著影响MCD和MRCD估计器在离群值检测中的性能。如果两个估计器使用的阈值相同,则它们的性能相似。然而,当阈值彼此不同时,就会出现差异。较高的阈值可以识别不太极端的异常值,而较低的阈值可以有效识别更极端的异常值。这些结果为这些估计器在离群值检测场景中的行为提供了见解,揭示了它们对阈值选择和样本量的敏感性。结论:我们的研究揭示了阈值选择、样本量以及最小协方差行动式(MCD)和最小正则化协方差行动式(MRCD)估计器在离群值检测中的性能之间的关键相互依赖性。通过在模拟数据的受控环境中进行系统的探索,我们收集了有价值的见解,可以为组织科学研究领域的研究人员和实践者提供信息。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Threshold Effects on Outlier Detection: A Comparative Study of MCD and MRCD Estimators in Multivariate Data Analysis
Aims: The aim of this study is to investigate the impact of thresholds on the detection of outliers by comparing the performance of two estimators, namely the minimum covariance determinant (MCD) and minimum regularized covariance determinant (MRCD), at different sample sizes. The study uses simulated data generated from the standard normal distribution to assess how varying thresholds affect the ability of these estimators to detect outliers. Study Design: This study employs a quantitative research design. It involves the generation of simulated data, the application of the MCD and MRCD estimators for outlier detection, and the systematic manipulation of thresholds and sample size as independent variables. Place and Duration: The study is conducted using computational tools and did not require a physical location. Methodology: Simulated data is generated from the standard normal distribution to create a controlled environment for outlier detection experiments. The MCD and MRCD estimators are applied to the simulated data to detect outliers. These estimators are sensitive to deviations from the norm in the data. Different thresholds are systematically applied to the data, and the performance of the estimators is assessed at each threshold level. Thresholds may vary in their extremeness. The study investigates the impact of different sample sizes on outlier detection. This involves using datasets with varying numbers of observations. The r programming language and associated packages are used as the statistical tool for data generation, analysis, and visualization. Results: The study's findings indicate that the choice of thresholds in data analysis significantly affects the performance of the MCD and MRCD estimators in outlier detection. If the thresholds used for both estimators are the same, their performance is similar. However, differences emerge when thresholds differ from each other. Higher thresholds are shown to identify less extreme outliers, while lower thresholds are effective at identifying more extreme outliers. These results provide insights into the behavior of these estimators in outlier detection scenarios, shedding light on their sensitivity to threshold choices and sample size.Conclusion: Our study has shed light on the critical interdependencies among threshold choices, sample sizes, and the performance of the minimum covariance determinant (MCD) and minimum regularized covariance determinant (MRCD) estimators in the context of outlier detection. By conducting a systematic exploration in a controlled environment with simulated data, we have gleaned valuable insights that can inform both researchers and practitioners in the field of organizational science research.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Bayesian Sequential Updation and Prediction of Currency in Circulation Using a Weighted Prior Assessment of Required Sample Sizes for Estimating Proportions Rainfall Pattern in Kenya: Bayesian Non-parametric Model Based on the Normalized Generalized Gamma Process Advancing Retail Predictions: Integrating Diverse Machine Learning Models for Accurate Walmart Sales Forecasting Common Fixed-Point Theorem for Expansive Mappings in Dualistic Partial Metric Spaces
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1