{"title":"阈值对离群值检测的影响:多元数据分析中MCD和MRCD估计器的比较研究","authors":"Nafisat Yusuf, Bannister Jerry Zachary","doi":"10.9734/ajpas/2023/v25i2557","DOIUrl":null,"url":null,"abstract":"Aims: The aim of this study is to investigate the impact of thresholds on the detection of outliers by comparing the performance of two estimators, namely the minimum covariance determinant (MCD) and minimum regularized covariance determinant (MRCD), at different sample sizes. The study uses simulated data generated from the standard normal distribution to assess how varying thresholds affect the ability of these estimators to detect outliers.
 Study Design: This study employs a quantitative research design. It involves the generation of simulated data, the application of the MCD and MRCD estimators for outlier detection, and the systematic manipulation of thresholds and sample size as independent variables.
 Place and Duration: The study is conducted using computational tools and did not require a physical location.
 Methodology: Simulated data is generated from the standard normal distribution to create a controlled environment for outlier detection experiments. The MCD and MRCD estimators are applied to the simulated data to detect outliers. These estimators are sensitive to deviations from the norm in the data. Different thresholds are systematically applied to the data, and the performance of the estimators is assessed at each threshold level. Thresholds may vary in their extremeness. The study investigates the impact of different sample sizes on outlier detection. This involves using datasets with varying numbers of observations. The r programming language and associated packages are used as the statistical tool for data generation, analysis, and visualization.
 Results: The study's findings indicate that the choice of thresholds in data analysis significantly affects the performance of the MCD and MRCD estimators in outlier detection. If the thresholds used for both estimators are the same, their performance is similar. However, differences emerge when thresholds differ from each other. Higher thresholds are shown to identify less extreme outliers, while lower thresholds are effective at identifying more extreme outliers. These results provide insights into the behavior of these estimators in outlier detection scenarios, shedding light on their sensitivity to threshold choices and sample size.Conclusion: Our study has shed light on the critical interdependencies among threshold choices, sample sizes, and the performance of the minimum covariance determinant (MCD) and minimum regularized covariance determinant (MRCD) estimators in the context of outlier detection. By conducting a systematic exploration in a controlled environment with simulated data, we have gleaned valuable insights that can inform both researchers and practitioners in the field of organizational science research.","PeriodicalId":8532,"journal":{"name":"Asian Journal of Probability and Statistics","volume":"63 3","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Threshold Effects on Outlier Detection: A Comparative Study of MCD and MRCD Estimators in Multivariate Data Analysis\",\"authors\":\"Nafisat Yusuf, Bannister Jerry Zachary\",\"doi\":\"10.9734/ajpas/2023/v25i2557\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Aims: The aim of this study is to investigate the impact of thresholds on the detection of outliers by comparing the performance of two estimators, namely the minimum covariance determinant (MCD) and minimum regularized covariance determinant (MRCD), at different sample sizes. The study uses simulated data generated from the standard normal distribution to assess how varying thresholds affect the ability of these estimators to detect outliers.
 Study Design: This study employs a quantitative research design. It involves the generation of simulated data, the application of the MCD and MRCD estimators for outlier detection, and the systematic manipulation of thresholds and sample size as independent variables.
 Place and Duration: The study is conducted using computational tools and did not require a physical location.
 Methodology: Simulated data is generated from the standard normal distribution to create a controlled environment for outlier detection experiments. The MCD and MRCD estimators are applied to the simulated data to detect outliers. These estimators are sensitive to deviations from the norm in the data. Different thresholds are systematically applied to the data, and the performance of the estimators is assessed at each threshold level. Thresholds may vary in their extremeness. The study investigates the impact of different sample sizes on outlier detection. This involves using datasets with varying numbers of observations. The r programming language and associated packages are used as the statistical tool for data generation, analysis, and visualization.
 Results: The study's findings indicate that the choice of thresholds in data analysis significantly affects the performance of the MCD and MRCD estimators in outlier detection. If the thresholds used for both estimators are the same, their performance is similar. However, differences emerge when thresholds differ from each other. Higher thresholds are shown to identify less extreme outliers, while lower thresholds are effective at identifying more extreme outliers. These results provide insights into the behavior of these estimators in outlier detection scenarios, shedding light on their sensitivity to threshold choices and sample size.Conclusion: Our study has shed light on the critical interdependencies among threshold choices, sample sizes, and the performance of the minimum covariance determinant (MCD) and minimum regularized covariance determinant (MRCD) estimators in the context of outlier detection. By conducting a systematic exploration in a controlled environment with simulated data, we have gleaned valuable insights that can inform both researchers and practitioners in the field of organizational science research.\",\"PeriodicalId\":8532,\"journal\":{\"name\":\"Asian Journal of Probability and Statistics\",\"volume\":\"63 3\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Asian Journal of Probability and Statistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.9734/ajpas/2023/v25i2557\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Asian Journal of Probability and Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.9734/ajpas/2023/v25i2557","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Threshold Effects on Outlier Detection: A Comparative Study of MCD and MRCD Estimators in Multivariate Data Analysis
Aims: The aim of this study is to investigate the impact of thresholds on the detection of outliers by comparing the performance of two estimators, namely the minimum covariance determinant (MCD) and minimum regularized covariance determinant (MRCD), at different sample sizes. The study uses simulated data generated from the standard normal distribution to assess how varying thresholds affect the ability of these estimators to detect outliers.
Study Design: This study employs a quantitative research design. It involves the generation of simulated data, the application of the MCD and MRCD estimators for outlier detection, and the systematic manipulation of thresholds and sample size as independent variables.
Place and Duration: The study is conducted using computational tools and did not require a physical location.
Methodology: Simulated data is generated from the standard normal distribution to create a controlled environment for outlier detection experiments. The MCD and MRCD estimators are applied to the simulated data to detect outliers. These estimators are sensitive to deviations from the norm in the data. Different thresholds are systematically applied to the data, and the performance of the estimators is assessed at each threshold level. Thresholds may vary in their extremeness. The study investigates the impact of different sample sizes on outlier detection. This involves using datasets with varying numbers of observations. The r programming language and associated packages are used as the statistical tool for data generation, analysis, and visualization.
Results: The study's findings indicate that the choice of thresholds in data analysis significantly affects the performance of the MCD and MRCD estimators in outlier detection. If the thresholds used for both estimators are the same, their performance is similar. However, differences emerge when thresholds differ from each other. Higher thresholds are shown to identify less extreme outliers, while lower thresholds are effective at identifying more extreme outliers. These results provide insights into the behavior of these estimators in outlier detection scenarios, shedding light on their sensitivity to threshold choices and sample size.Conclusion: Our study has shed light on the critical interdependencies among threshold choices, sample sizes, and the performance of the minimum covariance determinant (MCD) and minimum regularized covariance determinant (MRCD) estimators in the context of outlier detection. By conducting a systematic exploration in a controlled environment with simulated data, we have gleaned valuable insights that can inform both researchers and practitioners in the field of organizational science research.